1
|
Didelot X, Helekal D, Roberts I. Ancestral process for infectious disease outbreaks with superspreading. J Theor Biol 2025; 607:112109. [PMID: 40233604 DOI: 10.1016/j.jtbi.2025.112109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Revised: 03/25/2025] [Accepted: 03/31/2025] [Indexed: 04/17/2025]
Abstract
When an infectious disease outbreak is of a relatively small size, describing the ancestry of a sample of infected individuals is difficult because most ancestral models assume large population sizes. Given a set of infected individuals, we show that it is possible to express exactly the probability that they have the same infector, either inclusively (so that other individuals may have the same infector too) or exclusively (so that they may not). To compute these probabilities requires knowledge of the offspring distribution, which determines how many infections each infected individual causes. We consider transmission both without and with superspreading, in the form of a Poisson and a Negative-Binomial offspring distribution, respectively. We show how our results can be incorporated into a new Lambda-coalescent model which allows multiple lineages to coalesce together. We call this new model the Omega-coalescent, we compare it with previously proposed alternatives, and advocate its use in future studies of infectious disease outbreaks.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences, University of Warwick, Coventry, United Kingdom; Department of Statistics, University of Warwick, Coventry, United Kingdom.
| | - David Helekal
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Ian Roberts
- Department of Statistics, University of Warwick, Coventry, United Kingdom; Pandemic Sciences Institute, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
2
|
Roberts I, Everitt RG, Koskela J, Didelot X. Bayesian Inference of Pathogen Phylogeography using the Structured Coalescent Model. PLoS Comput Biol 2025; 21:e1012995. [PMID: 40258093 PMCID: PMC12040344 DOI: 10.1371/journal.pcbi.1012995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 04/29/2025] [Accepted: 03/25/2025] [Indexed: 04/23/2025] Open
Abstract
Over the past decade, pathogen genome sequencing has become well established as a powerful approach to study infectious disease epidemiology. In particular, when multiple genomes are available from several geographical locations, comparing them is informative about the relative size of the local pathogen populations as well as past migration rates and events between locations. The structured coalescent model has a long history of being used as the underlying process for such phylogeographic analysis. However, the computational cost of using this model does not scale well to the large number of genomes frequently analysed in pathogen genomic epidemiology studies. Several approximations of the structured coalescent model have been proposed, but their effects are difficult to predict. Here we show how the exact structured coalescent model can be used to analyse a precomputed dated phylogeny, in order to perform Bayesian inference on the past migration history, the effective population sizes in each location, and the directed migration rates from any location to another. We describe an efficient reversible jump Markov Chain Monte Carlo scheme which is implemented in a new R package StructCoalescent. We use simulations to demonstrate the scalability and correctness of our method and to compare it with existing software. We also applied our new method to several state-of-the-art datasets on the population structure of real pathogens to showcase the relevance of our method to current data scales and research questions.
Collapse
Affiliation(s)
- Ian Roberts
- Department of Statistics, University of Warwick, Coventry, United Kingdom
| | - Richard G. Everitt
- Department of Statistics, University of Warwick, Coventry, United Kingdom
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry, United Kingdom
| | - Jere Koskela
- Department of Statistics, University of Warwick, Coventry, United Kingdom
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle, United Kingdom
| | - Xavier Didelot
- Department of Statistics, University of Warwick, Coventry, United Kingdom
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry, United Kingdom
- School of Life Sciences, University of Warwick, Coventry, United Kingdom
| |
Collapse
|
3
|
Koelle K, Rasmussen DA. Phylodynamics beyond neutrality: the impact of incomplete purifying selection on viral phylogenies and inference. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230314. [PMID: 39976414 PMCID: PMC11867112 DOI: 10.1098/rstb.2023.0314] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 10/07/2024] [Accepted: 11/04/2024] [Indexed: 02/21/2025] Open
Abstract
Viral phylodynamics focuses on using sequence data to make inferences about the population dynamics of viral diseases. These inferences commonly include estimation of growth rates, reproduction numbers and times of most recent common ancestor. With few exceptions, existing phylodynamic inference approaches assume that all observed and ancestral viral genetic variation is fitness-neutral. This assumption is commonly violated, with a large body of analyses indicating that fitness varies substantially among genotypes circulating in viral populations. Here, we focus on fitness variation arising from deleterious mutations, asking whether incomplete purifying selection of deleterious mutations has the potential to bias phylodynamic inference. We use simulations of an exponentially growing population to explore how incomplete purifying selection distorts tree shape and shifts the distribution of mutations over trees. We find that incomplete purifying selection strongly shapes the distribution of mutations while only weakly impacting tree shape. Despite incomplete purifying selection shifting the distribution of deleterious mutations, we find little discernible bias in estimates of viral growth rates and times of the most recent common ancestor. Our results reassuringly indicate that existing phylodynamic inference approaches that assume neutrality may nevertheless yield accurate epidemiological estimates in the face of incomplete purifying selection. More work is needed to assess the robustness of these findings to alternative epidemiological parametrizations.This article is part of the theme issue ''"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
Affiliation(s)
- Katia Koelle
- Department of Biology, Emory University, Atlanta, GA30322, USA
- Emory Center of Excellence for Influenza Research and Response (CEIRR), Atlanta, GA30322, USA
| | - David A. Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC27607, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC27607, USA
| |
Collapse
|
4
|
King AA, Lin Q, Ionides EL. EXACT PHYLODYNAMIC LIKELIHOOD VIA STRUCTURED MARKOV GENEALOGY PROCESSES. ARXIV 2025:arXiv:2405.17032v2. [PMID: 38855555 PMCID: PMC11160859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
We consider genealogies arising from a Markov population process in which individuals are categorized into a discrete collection of compartments, with the requirement that individuals within the same compartment are statistically exchangeable. When equipped with a sampling process, each such population process induces a time-evolving tree-valued process defined as the genealogy of all sampled individuals. We provide a construction of this genealogy process and derive exact expressions for the likelihood of an observed genealogy in terms of filter equations. These filter equations can be numerically solved using standard Monte Carlo integration methods. Thus, we obtain statistically efficient likelihood-based inference for essentially arbitrary compartment models based on an observed genealogy of individuals sampled from the population.
Collapse
|
5
|
Seidel S, Stadler T, Vaughan TG. Estimating pathogen spread using structured coalescent and birth-death models: A quantitative comparison. Epidemics 2024; 49:100795. [PMID: 39461051 DOI: 10.1016/j.epidem.2024.100795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 09/09/2024] [Accepted: 09/19/2024] [Indexed: 10/29/2024] Open
Abstract
Elucidating disease spread between subpopulations is crucial in guiding effective disease control efforts. Genomic epidemiology and phylodynamics have emerged as key principles to estimate such spread from pathogen phylogenies derived from molecular data. Two well-established structured phylodynamic methodologies - based on the coalescent and the birth-death model - are frequently employed to estimate viral spread between populations. Nonetheless, these methodologies operate under distinct assumptions whose impact on the accuracy of migration rate inference is yet to be thoroughly investigated. In this manuscript, we present a simulation study, contrasting the inferential outcomes of the structured coalescent model with constant population size and the multitype birth-death model with a constant rate. We explore this comparison across a range of migration rates in endemic diseases and epidemic outbreaks. The results of the epidemic outbreak analysis revealed that the birth-death model exhibits a superior ability to retrieve accurate migration rates compared to the coalescent model, regardless of the actual migration rate. Thus, to estimate accurate migration rates, the population dynamics have to be accounted for. On the other hand, for the endemic disease scenario, our investigation demonstrates that both models produce comparable coverage and accuracy of the migration rates, with the coalescent model generating more precise estimates. Regardless of the specific scenario, both models similarly estimated the source location of the disease. This research offers tangible modelling advice for infectious disease analysts, suggesting the use of either model for endemic diseases. For epidemic outbreaks, or scenarios with varying population size, structured phylodynamic models relying on the Kingman coalescent with constant population size should be avoided as they can lead to inaccurate estimates of the migration rate. Instead, coalescent models accounting for varying population size or birth-death models should be favoured. Importantly, our study emphasises the value of directly capturing exponential growth dynamics which could be a useful enhancement for structured coalescent models.
Collapse
Affiliation(s)
- Sophie Seidel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics (SIB), Basel, Switzerland.
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics (SIB), Basel, Switzerland.
| |
Collapse
|
6
|
Kingston H, Nduva G, Chohan BH, Mbogo L, Monroe-Wise A, Sambai B, Guthrie BL, Wilkinson E, Giandhari J, Masyuko S, Sinkele W, de Oliveria T, Bukusi D, Scott J, Farquhar C, Herbeck JT. A phylogenetic assessment of HIV-1 transmission trends among people who inject drugs from Coastal and Nairobi, Kenya. Virus Evol 2024; 10:veae092. [PMID: 39678353 PMCID: PMC11640816 DOI: 10.1093/ve/veae092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 10/19/2024] [Accepted: 11/10/2024] [Indexed: 12/17/2024] Open
Abstract
Although recent modeling suggests that needle-syringe programs (NSPs) have reduced parenteral HIV transmission among people who inject drugs (PWID) in Kenya, the prevalence in this population remains high (∼14-20%, compared to ∼4% in the larger population). Reducing transmission or acquisition requires understanding historic and modern transmission trends, but the relationship between the PWID HIV-1 sub-epidemic and the general epidemic in Kenya is not well understood. We incorporated 303 new (2018-21) HIV-1 pol sequences from PWID and their sexual and injecting partners with 2666 previously published Kenyan HIV-1 sequences to quantify relative rates and direction of HIV-1 transmissions involving PWID from the coast and Nairobi regions of Kenya. We used genetic similarity cluster analysis (thresholds: patristic distance <0.045 and <0.015) and maximum likelihood and Bayesian ancestral state reconstruction to estimate transmission histories at the population group (female sex workers, men who have sex with men, PWID, or general population) and regional (coast or Nairobi) levels. Of 1081 participants living with HIV-1, 274 (25%) were not virally suppressed and 303 (28%) had sequences available. Of new sequences from PWID, 58% were in phylogenetic clusters at distance threshold <0.045. Only 21% of clusters containing sequences from PWID included a second PWID sequence. Sequences from PWID were similarly likely to cluster with sequences from female sex workers, men who have sex with men, and the general population. Ancestral state reconstruction suggested that transmission to PWID from other populations was more common than from PWID to other populations. This study expands our understanding of the HIV-1 sub-epidemic among PWID in Kenya by incorporating four times more HIV-1 sequences from this population than prior studies. Despite recruiting many PWID from local sexual and injecting networks, we found low levels of linked transmission in this population. This may suggest lower relative levels of parenteral transmission in recent years and supports maintaining NSPs among PWID, while also strengthening interventions to reduce HIV-1 sexual acquisition and transmission for this population.
Collapse
Affiliation(s)
- Hanley Kingston
- Institute for Public Health Genetics, University of Washington, 1410 NE Campus Parkway, Seattle, WA 98195, United States
| | - George Nduva
- Department of Translational Medicine, Lund University, Box 117, Lund SE-221 00, Sweden
| | - Bhavna H Chohan
- Centre for Virus Research, Kenya Medical Research Institute, Mbagathi Rd, Nairobi P.O. Box 54628-00200, Kenya
- Department of Global Health, University of Washington, 3980 15th Avenue, Seattle, WA 98195, United States
| | - Loice Mbogo
- University of Washington Global Assistance Program-Kenya, Nairobi, Kenya
- Kenyatta National Hospital, Hospital Rd, Nairobi, Kenya
| | - Aliza Monroe-Wise
- Department of Global Health, University of Washington, 3980 15th Avenue, Seattle, WA 98195, United States
| | - Betsy Sambai
- Population Council-Kenya, Avenue 5, Rose Ave, Nairobi, Kenya
| | - Brandon L Guthrie
- Department of Global Health, University of Washington, 3980 15th Avenue, Seattle, WA 98195, United States
- Department of Epidemiology, University of Washington, 3980 15th Avenue, Seattle, WA 98195, United States
| | - Eduan Wilkinson
- KwaZulu-Natal Research Innovation and Sequencing Platform, Nelson R Mandel School of Medicine, University of KwaZulu-Natal, 719 Umbilo Rd, Berea, Durban, KwaZulu-Natal 4001, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Hammanshand Rd, Stellenbosch Central, Stellenbosch 7600, South Africa
| | - Jennifer Giandhari
- KwaZulu-Natal Research Innovation and Sequencing Platform, Nelson R Mandel School of Medicine, University of KwaZulu-Natal, 719 Umbilo Rd, Berea, Durban, KwaZulu-Natal 4001, South Africa
| | - Sarah Masyuko
- Department of Global Health, University of Washington, 3980 15th Avenue, Seattle, WA 98195, United States
- Ministry of Health, Cathedral Rd, Kilimani, Nairobi, Kenya
| | - William Sinkele
- Support for Addiction Prevention and Treatment in Africa, Corner House, Nairobi, Kenya
| | - Tulio de Oliveria
- Department of Global Health, University of Washington, 3980 15th Avenue, Seattle, WA 98195, United States
- KwaZulu-Natal Research Innovation and Sequencing Platform, Nelson R Mandel School of Medicine, University of KwaZulu-Natal, 719 Umbilo Rd, Berea, Durban, KwaZulu-Natal 4001, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Hammanshand Rd, Stellenbosch Central, Stellenbosch 7600, South Africa
| | - David Bukusi
- Kenyatta National Hospital, Hospital Rd, Nairobi, Kenya
| | - John Scott
- Department of Medicine, University of Washington, 1410 NE Campus Parkway, Seattle, WA 98195, United States
| | - Carey Farquhar
- Department of Global Health, University of Washington, 3980 15th Avenue, Seattle, WA 98195, United States
- Department of Epidemiology, University of Washington, 3980 15th Avenue, Seattle, WA 98195, United States
- Department of Medicine, University of Washington, 1410 NE Campus Parkway, Seattle, WA 98195, United States
| | - Joshua T Herbeck
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, 500 5th Ave N, Seattle, WA 98109, United States
| |
Collapse
|
7
|
Tay JH, Kocher A, Duchene S. Assessing the effect of model specification and prior sensitivity on Bayesian tests of temporal signal. PLoS Comput Biol 2024; 20:e1012371. [PMID: 39504312 PMCID: PMC11573219 DOI: 10.1371/journal.pcbi.1012371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 11/18/2024] [Accepted: 10/23/2024] [Indexed: 11/08/2024] Open
Abstract
Our understanding of the evolution of many microbes has been revolutionised by the molecular clock, a statistical tool to infer evolutionary rates and timescales from analyses of biomolecular sequences. In all molecular clock models, evolutionary rates and times are jointly unidentifiable and 'calibration' information must therefore be used. For many organisms, sequences sampled at different time points can be employed for such calibration. Before attempting to do so, it is recommended to verify that the data carry sufficient information for molecular dating, a practice referred to as evaluation of temporal signal. Recently, a fully Bayesian approach, BETS (Bayesian Evaluation of Temporal Signal), was proposed to overcome known limitations of other commonly used techniques such as root-to-tip regression or date randomisation tests. BETS requires the specification of a full Bayesian phylogenetic model, posing several considerations for untangling the impact of model choice on the detection of temporal signal. Here, we aimed to (i) explore the effect of molecular clock model and tree prior specification on the results of BETS and (ii) provide guidelines for improving our confidence in molecular clock estimates. Using microbial molecular sequence data sets and simulation experiments, we assess the impact of the tree prior and its hyperparameters on the accuracy of temporal signal detection. In particular, highly informative priors that are inconsistent with the data can result in the incorrect detection of temporal signal. In consequence, we recommend: (i) using prior predictive simulations to determine whether the prior generates a reasonable expectation of parameters of interest, such as the evolutionary rate and age of the root node, (ii) conducting prior sensitivity analyses to assess the robustness of the posterior to the choice of prior, and (iii) selecting a molecular clock model that reasonably describes the evolutionary process.
Collapse
Affiliation(s)
- John H. Tay
- Peter Doherty Institute for Infection and Immunity, Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Arthur Kocher
- Transmission, Infection, Diversification and Evolution Group, Max Planck Institute of Geoanthropology, Jena, Germany
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Sebastian Duchene
- Peter Doherty Institute for Infection and Immunity, Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
- DEMI unit, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
8
|
Thompson A, Liebeskind BJ, Scully EJ, Landis MJ. Deep Learning and Likelihood Approaches for Viral Phylogeography Converge on the Same Answers Whether the Inference Model Is Right or Wrong. Syst Biol 2024; 73:183-206. [PMID: 38189575 PMCID: PMC11249978 DOI: 10.1093/sysbio/syad074] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 11/22/2023] [Accepted: 01/05/2024] [Indexed: 01/09/2024] Open
Abstract
Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.
Collapse
Affiliation(s)
- Ammon Thompson
- Participant in an Education Program Sponsored by U.S. Department of Defense (DOD) at the National Geospatial-Intelligence Agency, Springfield, VA 22150, USA
| | | | - Erik J Scully
- National Geospatial-Intelligence Agency, Springfield, VA 22150, USA
| | - Michael J Landis
- Department of Biology, Washington University in St. Louis, Rebstock Hall, St. Louis, MO 63130, USA
| |
Collapse
|
9
|
Yu Q, Ascensao JA, Okada T, The COVID-19 Genomics UK (COG-UK) Consortium, Boyd O, Volz E, Hallatschek O. Lineage frequency time series reveal elevated levels of genetic drift in SARS-CoV-2 transmission in England. PLoS Pathog 2024; 20:e1012090. [PMID: 38620033 PMCID: PMC11045146 DOI: 10.1371/journal.ppat.1012090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 04/25/2024] [Accepted: 03/03/2024] [Indexed: 04/17/2024] Open
Abstract
Genetic drift in infectious disease transmission results from randomness of transmission and host recovery or death. The strength of genetic drift for SARS-CoV-2 transmission is expected to be high due to high levels of superspreading, and this is expected to substantially impact disease epidemiology and evolution. However, we don't yet have an understanding of how genetic drift changes over time or across locations. Furthermore, noise that results from data collection can potentially confound estimates of genetic drift. To address this challenge, we develop and validate a method to jointly infer genetic drift and measurement noise from time-series lineage frequency data. Our method is highly scalable to increasingly large genomic datasets, which overcomes a limitation in commonly used phylogenetic methods. We apply this method to over 490,000 SARS-CoV-2 genomic sequences from England collected between March 2020 and December 2021 by the COVID-19 Genomics UK (COG-UK) consortium and separately infer the strength of genetic drift for pre-B.1.177, B.1.177, Alpha, and Delta. We find that even after correcting for measurement noise, the strength of genetic drift is consistently, throughout time, higher than that expected from the observed number of COVID-19 positive individuals in England by 1 to 3 orders of magnitude, which cannot be explained by literature values of superspreading. Our estimates of genetic drift suggest low and time-varying establishment probabilities for new mutations, inform the parametrization of SARS-CoV-2 evolutionary models, and motivate future studies of the potential mechanisms for increased stochasticity in this system.
Collapse
Affiliation(s)
- QinQin Yu
- Department of Physics, University of California, Berkeley, California, United States of America
| | - Joao A. Ascensao
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| | - Takashi Okada
- Department of Physics, University of California, Berkeley, California, United States of America
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
- Institute for Life and Medical Sciences, Kyoto University, Kyoto, Japan
- RIKEN iTHEMS, Wako, Saitama, Japan
| | | | - Olivia Boyd
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Erik Volz
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Oskar Hallatschek
- Department of Physics, University of California, Berkeley, California, United States of America
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
- Peter Debye Institute for Soft Matter Physics, Leipzig University, Leipzig, Germany
| |
Collapse
|
10
|
Wilinski M, Castro L, Keithley J, Manore C, Campos J, Romero-Severson E, Domman D, Lokhov AY. Congruity of genomic and epidemiological data in modelling of local cholera outbreaks. Proc Biol Sci 2024; 291:20232805. [PMID: 38503333 PMCID: PMC10950457 DOI: 10.1098/rspb.2023.2805] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/19/2024] [Indexed: 03/21/2024] Open
Abstract
Cholera continues to be a global health threat. Understanding how cholera spreads between locations is fundamental to the rational, evidence-based design of intervention and control efforts. Traditionally, cholera transmission models have used cholera case-count data. More recently, whole-genome sequence data have qualitatively described cholera transmission. Integrating these data streams may provide much more accurate models of cholera spread; however, no systematic analyses have been performed so far to compare traditional case-count models to the phylodynamic models from genomic data for cholera transmission. Here, we use high-fidelity case-count and whole-genome sequencing data from the 1991 to 1998 cholera epidemic in Argentina to directly compare the epidemiological model parameters estimated from these two data sources. We find that phylodynamic methods applied to cholera genomics data provide comparable estimates that are in line with established methods. Our methodology represents a critical step in building a framework for integrating case-count and genomic data sources for cholera epidemiology and other bacterial pathogens.
Collapse
Affiliation(s)
- Mateusz Wilinski
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Lauren Castro
- Analytics, Intelligence and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Jeffrey Keithley
- Analytics, Intelligence and Technology Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Department of Computer Science, University of Iowa, Iowa City, IA, USA
| | - Carrie Manore
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Josefina Campos
- UO Centro Nacional de Genómica y Bioinformtica, ANLIS ‘Dr. Carlos G. Malbrán, Buenos Aires, Argentina
| | | | - Daryl Domman
- Center for Global Health, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Andrey Y. Lokhov
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| |
Collapse
|
11
|
Müller NF, Bouckaert RR, Wu CH, Bedford T. MASCOT-Skyline integrates population and migration dynamics to enhance phylogeographic reconstructions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.06.583734. [PMID: 38496513 PMCID: PMC10942421 DOI: 10.1101/2024.03.06.583734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
The spread of infectious diseases is shaped by spatial and temporal aspects, such as host population structure or changes in the transmission rate or number of infected individuals over time. These spatiotemporal dynamics are imprinted in the genome of pathogens and can be recovered from those genomes using phylodynamics methods. However, phylodynamic methods typically quantify either the temporal or spatial transmission dynamics, which leads to unclear biases, as one can potentially not be inferred without the other. Here, we address this challenge by introducing a structured coalescent skyline approach, MASCOT-Skyline that allows us to jointly infer spatial and temporal transmission dynamics of infectious diseases using Markov chain Monte Carlo inference. To do so, we model the effective population size dynamics in different locations using a non-parametric function, allowing us to approximate a range of population size dynamics. We show, using a range of different viral outbreak datasets, potential issues with phylogeographic methods. We then use these viral datasets to motivate simulations of outbreaks that illuminate the nature of biases present in the different phylogeographic methods. We show that spatial and temporal dynamics should be modeled jointly even if one seeks to recover just one of the two. Further, we showcase conditions under which we can expect phylogeographic analyses to be biased, particularly different subsampling approaches, as well as provide recommendations of when we can expect them to perform well. We implemented MASCOT-Skyline as part of the open-source software package MASCOT for the Bayesian phylodynamics platform BEAST2.
Collapse
Affiliation(s)
- Nicola F. Müller
- Division of HIV, ID and Global Medicine, University of California San Francisco, San Francisco, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA
| | - Remco R. Bouckaert
- Centre for Computational Evolution, The University of Auckland, New Zealand
| | - Chieh-Hsi Wu
- School of Mathematical Sciences, University of Southampton, UK
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA
- Howard Hughes Medical Institute, Seattle, USA
| |
Collapse
|
12
|
Vaughan TG, Scire J, Nadeau SA, Stadler T. Estimates of early outbreak-specific SARS-CoV-2 epidemiological parameters from genomic data. Proc Natl Acad Sci U S A 2024; 121:e2308125121. [PMID: 38175864 PMCID: PMC10786264 DOI: 10.1073/pnas.2308125121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 12/02/2023] [Indexed: 01/06/2024] Open
Abstract
We estimate the basic reproductive number and case counts for 15 distinct Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) outbreaks, distributed across 11 populations (10 countries and one cruise ship), based solely on phylodynamic analyses of genomic data. Our results indicate that, prior to significant public health interventions, the reproductive numbers for 10 (out of 15) of these outbreaks are similar, with median posterior estimates ranging between 1.4 and 2.8. These estimates provide a view which is complementary to that provided by those based on traditional line listing data. The genomic-based view is arguably less susceptible to biases resulting from differences in testing protocols, testing intensity, and import of cases into the community of interest. In the analyses reported here, the genomic data primarily provide information regarding which samples belong to a particular outbreak. We observe that once these outbreaks are identified, the sampling dates carry the majority of the information regarding the reproductive number. Finally, we provide genome-based estimates of the cumulative number of infections for each outbreak. For 7 out of 11 of the populations studied, the number of confirmed cases is much bigger than the cumulative number of infections estimated from the sequence data, a possible explanation being the presence of unsequenced outbreaks in these populations.
Collapse
Affiliation(s)
- Timothy G. Vaughan
- Department of Biosystems Science and Engineering, Eidgenössiche Technische Hochschule Zurich, Basel4058, Switzerland
- Computational Evolution Group, Swiss Institute of Bioinformatics, Lausanne1015, Switzerland
| | - Jérémie Scire
- Department of Biosystems Science and Engineering, Eidgenössiche Technische Hochschule Zurich, Basel4058, Switzerland
- Computational Evolution Group, Swiss Institute of Bioinformatics, Lausanne1015, Switzerland
| | - Sarah A. Nadeau
- Department of Biosystems Science and Engineering, Eidgenössiche Technische Hochschule Zurich, Basel4058, Switzerland
- Computational Evolution Group, Swiss Institute of Bioinformatics, Lausanne1015, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, Eidgenössiche Technische Hochschule Zurich, Basel4058, Switzerland
- Computational Evolution Group, Swiss Institute of Bioinformatics, Lausanne1015, Switzerland
| |
Collapse
|
13
|
Vaughan TG. ReMASTER: improved phylodynamic simulation for BEAST 2.7. Bioinformatics 2024; 40:btae015. [PMID: 38195927 PMCID: PMC10796175 DOI: 10.1093/bioinformatics/btae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/30/2023] [Accepted: 01/08/2024] [Indexed: 01/11/2024] Open
Abstract
SUMMARY Phylodynamic models link phylogenetic trees to biologically-relevant parameters such as speciation and extinction rates (macroevolution), effective population sizes and migration rates (ecology and phylogeography), and transmission and removal/recovery rates (epidemiology) to name a few. Being able to simulate phylogenetic trees and population dynamics under these models is the basis for (i) developing and testing of phylodynamic inference algorithms, (ii) performing simulation studies which quantify the biases stemming from model-misspecification, and (iii) performing so-called model adequacy assessments by simulating samples from the posterior predictive distribution. Here I introduce ReMASTER, a package for the phylogenetic inference platform BEAST 2 that provides a simple and efficient approach to specifying and simulating the phylogenetic trees and population dynamics arising from phylodynamic models. Being a component of BEAST 2 allows ReMASTER to also form the basis of joint simulation and inference analyses. ReMASTER is a complete rewrite of an earlier package, MASTER, and boasts improved efficiency, ease of use, flexibility of model specification, and deeper integration with BEAST 2. AVAILABILITY AND IMPLEMENTATION ReMASTER can be installed directly from the BEAST 2 package manager, and its documentation is available online at https://tgvaughan.github.io/remaster. ReMASTER is free software, and is distributed under version 3 of the GNU General Public License. The Java source code for ReMASTER is available from https://github.com/tgvaughan/remaster.
Collapse
Affiliation(s)
- Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
14
|
Hayati M, Sobkowiak B, Stockdale JE, Colijn C. Phylogenetic identification of influenza virus candidates for seasonal vaccines. SCIENCE ADVANCES 2023; 9:eabp9185. [PMID: 37922357 PMCID: PMC10624341 DOI: 10.1126/sciadv.abp9185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 10/05/2023] [Indexed: 11/05/2023]
Abstract
The seasonal influenza (flu) vaccine is designed to protect against those influenza viruses predicted to circulate during the upcoming flu season, but identifying which viruses are likely to circulate is challenging. We use features from phylogenetic trees reconstructed from hemagglutinin (HA) and neuraminidase (NA) sequences, together with a support vector machine, to predict future circulation. We obtain accuracies of 0.75 to 0.89 (AUC 0.83 to 0.91) over 2016-2020. We explore ways to select potential candidates for a seasonal vaccine and find that the machine learning model has a moderate ability to select strains that are close to future populations. However, consensus sequences among the most recent 3 years also do well at this task. We identify similar candidate strains to those proposed by the World Health Organization, suggesting that this approach can help inform vaccine strain selection.
Collapse
Affiliation(s)
- Maryam Hayati
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Benjamin Sobkowiak
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | | | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
15
|
Helekal D, Keeling M, Grad YH, Didelot X. Estimating the fitness cost and benefit of antimicrobial resistance from pathogen genomic data. J R Soc Interface 2023; 20:20230074. [PMID: 37312496 PMCID: PMC10265023 DOI: 10.1098/rsif.2023.0074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/22/2023] [Indexed: 06/15/2023] Open
Abstract
Increasing levels of antibiotic resistance in many bacterial pathogen populations are a major threat to public health. Resistance to an antibiotic provides a fitness benefit when the bacteria are exposed to this antibiotic, but resistance also often comes at a cost to the resistant pathogen relative to susceptible counterparts. We lack a good understanding of these benefits and costs of resistance for many bacterial pathogens and antibiotics, but estimating them could lead to better use of antibiotics in a way that reduces or prevents the spread of resistance. Here, we propose a new model for the joint epidemiology of susceptible and resistant variants, which includes explicit parameters for the cost and benefit of resistance. We show how Bayesian inference can be performed under this model using phylogenetic data from susceptible and resistant lineages and that by combining data from both we are able to disentangle and estimate the resistance cost and benefit parameters separately. We applied our inferential methodology to several simulated datasets to demonstrate good scalability and accuracy. We analysed a dataset of Neisseria gonorrhoeae genomes collected between 2000 and 2013 in the USA. We found that two unrelated lineages resistant to fluoroquinolones shared similar epidemic dynamics and resistance parameters. Fluoroquinolones were abandoned for the treatment of gonorrhoea due to increasing levels of resistance, but our results suggest that they could be used to treat a minority of around 10% of cases without causing resistance to grow again.
Collapse
Affiliation(s)
- David Helekal
- Centre for Doctoral Training in Mathematics for Real-World Systems, University of Warwick, Coventry, UK
| | - Matt Keeling
- Mathematics Institute and School of Life Sciences, University of Warwick, Coventry, UK
| | - Yonatan H. Grad
- Department of Immunology and Infectious Diseases, TH Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry, UK
| |
Collapse
|
16
|
Park Y, Martin MA, Koelle K. Epidemiological inference for emerging viruses using segregating sites. Nat Commun 2023; 14:3105. [PMID: 37248255 PMCID: PMC10226718 DOI: 10.1038/s41467-023-38809-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 05/16/2023] [Indexed: 05/31/2023] Open
Abstract
Epidemiological models are commonly fit to case and pathogen sequence data to estimate parameters and to infer unobserved disease dynamics. Here, we present an inference approach based on sequence data that is well suited for model fitting early on during the expansion of a viral lineage. Our approach relies on a trajectory of segregating sites to infer epidemiological parameters within a Sequential Monte Carlo framework. Using simulated data, we first show that our approach accurately recovers key epidemiological quantities under a single-introduction scenario. We then apply our approach to SARS-CoV-2 sequence data from France, estimating a basic reproduction number of approximately 2.3-2.7 under an epidemiological model that allows for multiple introductions. Our approach presented here indicates that inference approaches that rely on simple population genetic summary statistics can be informative of epidemiological parameters and can be used for reconstructing infectious disease dynamics during the early expansion of a viral lineage.
Collapse
Affiliation(s)
- Yeongseon Park
- Graduate Program in Population Biology, Ecology, and Evolution, Emory University, Atlanta, GA, 30322, USA
| | - Michael A Martin
- Graduate Program in Population Biology, Ecology, and Evolution, Emory University, Atlanta, GA, 30322, USA
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Katia Koelle
- Department of Biology, Emory University, Atlanta, GA, 30322, USA.
- Emory Center of Excellence for Influenza Research and Response (CEIRR), Atlanta, GA, USA.
| |
Collapse
|
17
|
Tang M, Dudas G, Bedford T, Minin VN. Fitting stochastic epidemic models to gene genealogies using linear noise approximation. Ann Appl Stat 2023. [DOI: 10.1214/21-aoas1583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Mingwei Tang
- Department of Statistics, University of Washington, Seattle
| | - Gytis Dudas
- Gothenburg Global Biodiversity Centre (GGBC)
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center
| | | |
Collapse
|
18
|
Danesh G, Saulnier E, Gascuel O, Choisy M, Alizon S. TiPS
: Rapidly simulating trajectories and phylogenies from compartmental models. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.14038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Gonché Danesh
- MIVEGEC, CNRS, IRD Université de Montpellier Montpellie France
| | - Emma Saulnier
- MIVEGEC, CNRS, IRD Université de Montpellier Montpellie France
| | | | - Marc Choisy
- Centre for Tropical Medicine and Global Health Nuffield Department of Medicine, University of Oxford Oxford UK
- Oxford University Clinical Research Unit Ho Chi Minh City Vietnam
| | - Samuel Alizon
- MIVEGEC, CNRS, IRD Université de Montpellier Montpellie France
- Center for Interdisciplinary Research in Biology (CIRB) College de France, CNRS, INSERM, Université PSL Paris France
| |
Collapse
|
19
|
Chao E, Chato C, Vender R, Olabode AS, Ferreira RC, Poon AFY. Molecular source attribution. PLoS Comput Biol 2022; 18:e1010649. [PMID: 36395093 PMCID: PMC9671344 DOI: 10.1371/journal.pcbi.1010649] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Elisa Chao
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Connor Chato
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Reid Vender
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- School of Medicine, Queen’s University, Kingston, Ontario, Canada
| | - Abayomi S. Olabode
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Roux-Cil Ferreira
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Art F. Y. Poon
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- * E-mail:
| |
Collapse
|
20
|
Didelot X, Parkhill J. A scalable analytical approach from bacterial genomes to epidemiology. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210246. [PMID: 35989600 PMCID: PMC9393561 DOI: 10.1098/rstb.2021.0246] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 02/17/2022] [Indexed: 12/21/2022] Open
Abstract
Recent years have seen a remarkable increase in the practicality of sequencing whole genomes from large numbers of bacterial isolates. The availability of this data has huge potential to deliver new insights into the evolution and epidemiology of bacterial pathogens, but the scalability of the analytical methodology has been lagging behind that of the sequencing technology. Here we present a step-by-step approach for such large-scale genomic epidemiology analyses, from bacterial genomes to epidemiological interpretations. A central component of this approach is the dated phylogeny, which is a phylogenetic tree with branch lengths measured in units of time. The construction of dated phylogenies from bacterial genomic data needs to account for the disruptive effect of recombination on phylogenetic relationships, and we describe how this can be achieved. Dated phylogenies can then be used to perform fine-scale or large-scale epidemiological analyses, depending on the proportion of cases for which genomes are available. A key feature of this approach is computational scalability and in particular the ability to process hundreds or thousands of genomes within a matter of hours. This is a clear advantage of the step-by-step approach described here. We discuss other advantages and disadvantages of the approach, as well as potential improvements and avenues for future research. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, UK
| |
Collapse
|
21
|
Hassler GW, Magee A, Zhang Z, Baele G, Lemey P, Ji X, Fourment M, Suchard MA. Data integration in Bayesian phylogenetics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 10:353-377. [PMID: 38774036 PMCID: PMC11108065 DOI: 10.1146/annurev-statistics-033021-112532] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g. DNA), time, location (both continuous and discrete) and environmental covariates (e.g. social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increased size and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances below, as well as unsolved problems and areas of active research.
Collapse
Affiliation(s)
- Gabriel W Hassler
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
| | - Andrew Magee
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Zhenyu Zhang
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, USA, 70118
| | - Mathieu Fourment
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo NSW, Australia, 2007
| | - Marc A Suchard
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
- Department of Human Genetics, University of California, Los Angeles, USA, 90095
| |
Collapse
|
22
|
Guo F, Carbone I, Rasmussen DA. Recombination-aware phylogeographic inference using the structured coalescent with ancestral recombination. PLoS Comput Biol 2022; 18:e1010422. [PMID: 35984849 PMCID: PMC9447913 DOI: 10.1371/journal.pcbi.1010422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 09/06/2022] [Accepted: 07/21/2022] [Indexed: 11/19/2022] Open
Abstract
Movement of individuals between populations or demes is often restricted, especially between geographically isolated populations. The structured coalescent provides an elegant theoretical framework for describing how movement between populations shapes the genealogical history of sampled individuals and thereby structures genetic variation within and between populations. However, in the presence of recombination an individual may inherit different regions of their genome from different parents, resulting in a mosaic of genealogical histories across the genome, which can be represented by an Ancestral Recombination Graph (ARG). In this case, different genomic regions may have different ancestral histories and so different histories of movement between populations. Recombination therefore poses an additional challenge to phylogeographic methods that aim to reconstruct the movement of individuals from genealogies, although also a potential benefit in that different loci may contain additional information about movement. Here, we introduce the Structured Coalescent with Ancestral Recombination (SCAR) model, which builds on recent approximations to the structured coalescent by incorporating recombination into the ancestry of sampled individuals. The SCAR model allows us to infer how the migration history of sampled individuals varies across the genome from ARGs, and improves estimation of key population genetic parameters such as population sizes, recombination rates and migration rates. Using the SCAR model, we explore the potential and limitations of phylogeographic inference using full ARGs. We then apply the SCAR to lineages of the recombining fungus Aspergillus flavus sampled across the United States to explore patterns of recombination and migration across the genome.
Collapse
Affiliation(s)
- Fangfang Guo
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Ignazio Carbone
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, North Carolina, United States of America
- Center for Integrated Fungal Research, North Carolina State University, Raleigh, North Carolina, United States of America
| | - David A. Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| |
Collapse
|
23
|
Carson J, Ledda A, Ferretti L, Keeling M, Didelot X. The bounded coalescent model: Conditioning a genealogy on a minimum root date. J Theor Biol 2022; 548:111186. [PMID: 35697144 DOI: 10.1016/j.jtbi.2022.111186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/05/2022] [Accepted: 06/02/2022] [Indexed: 01/27/2023]
Abstract
The coalescent model represents how individuals sampled from a population may have originated from a last common ancestor. The bounded coalescent model is obtained by conditioning the coalescent model such that the last common ancestor must have existed after a certain date. This conditioned model arises in a variety of applications, such as speciation, horizontal gene transfer or transmission analysis, and yet the bounded coalescent model has not been previously analysed in detail. Here we describe a new algorithm to simulate from this model directly, without resorting to rejection sampling. We show that this direct simulation algorithm is more computationally efficient than the rejection sampling approach. We also show how to calculate the probability of the last common ancestor occurring after a given date, which is required to compute the probability density of realisations under the bounded coalescent model. Our results are applicable in both the isochronous (when all samples have the same date) and heterochronous (where samples can have different dates) settings. We explore the effect of setting a bound on the date of the last common ancestor, and show that it affects a number of properties of the resulting phylogenies. All our methods are implemented in a new R package called BoundedCoalescent which is freely available online.
Collapse
Affiliation(s)
- Jake Carson
- Mathematics Institute, University of Warwick, United Kingdom
| | - Alice Ledda
- HCAI, Fungal, AMR, AMU & Sepsis Division, UK Health Security Agency, United Kingdom
| | - Luca Ferretti
- Big Data Institute, University of Oxford, United Kingdom
| | - Matt Keeling
- Mathematics Institute, University of Warwick, United Kingdom
| | - Xavier Didelot
- Department of Statistics and School of Life Sciences, University of Warwick, United Kingdom
| |
Collapse
|
24
|
Featherstone LA, Zhang JM, Vaughan TG, Duchene S. Epidemiological inference from pathogen genomes: A review of phylodynamic models and applications. Virus Evol 2022; 8:veac045. [PMID: 35775026 PMCID: PMC9241095 DOI: 10.1093/ve/veac045] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 05/23/2022] [Accepted: 06/02/2022] [Indexed: 11/24/2022] Open
Abstract
Phylodynamics requires an interdisciplinary understanding of phylogenetics, epidemiology, and statistical inference. It has also experienced more intense application than ever before amid the SARS-CoV-2 pandemic. In light of this, we present a review of phylodynamic models beginning with foundational models and assumptions. Our target audience is public health researchers, epidemiologists, and biologists seeking a working knowledge of the links between epidemiology, evolutionary models, and resulting epidemiological inference. We discuss the assumptions linking evolutionary models of pathogen population size to epidemiological models of the infected population size. We then describe statistical inference for phylodynamic models and list how output parameters can be rearranged for epidemiological interpretation. We go on to cover more sophisticated models and finish by highlighting future directions.
Collapse
Affiliation(s)
- Leo A Featherstone
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Joshua M Zhang
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
- Swiss Institute of Bioinformatics, Geneva 1015, Switzerland
| | - Sebastian Duchene
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| |
Collapse
|
25
|
Cappello L, Kim J, Liu S, Palacios JA. Statistical Challenges in Tracking the Evolution of SARS-CoV-2. Stat Sci 2022; 37:162-182. [PMID: 36034090 PMCID: PMC9409356 DOI: 10.1214/22-sts853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Genomic surveillance of SARS-CoV-2 has been instrumental in tracking the spread and evolution of the virus during the pandemic. The availability of SARS-CoV-2 molecular sequences isolated from infected individuals, coupled with phylodynamic methods, have provided insights into the origin of the virus, its evolutionary rate, the timing of introductions, the patterns of transmission, and the rise of novel variants that have spread through populations. Despite enormous global efforts of governments, laboratories, and researchers to collect and sequence molecular data, many challenges remain in analyzing and interpreting the data collected. Here, we describe the models and methods currently used to monitor the spread of SARS-CoV-2, discuss long-standing and new statistical challenges, and propose a method for tracking the rise of novel variants during the epidemic.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Departments of Economics and Business, Universitat Pompeu Fabra, 08005, Spain
| | - Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, New York 14853, USA\
| | - Sifan Liu
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - Julia A Palacios
- Departments of Statistics and Biomedical Data Sciences, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
26
|
Nduva GM, Otieno F, Kimani J, Wahome E, McKinnon LR, Cholette F, Majiwa M, Masika M, Mutua G, Anzala O, Graham SM, Gelmon L, Price MA, Smith AD, Bailey RC, Baele G, Lemey P, Hassan AS, Sanders EJ, Esbjörnsson J. Quantifying rates of HIV-1 flow between risk groups and geographic locations in Kenya: A country-wide phylogenetic study. Virus Evol 2022; 8:veac016. [PMID: 35356640 PMCID: PMC8962731 DOI: 10.1093/ve/veac016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 02/23/2022] [Accepted: 03/01/2022] [Indexed: 12/14/2022] Open
Abstract
In Kenya, HIV-1 key populations including men having sex with men (MSM), people who inject drugs (PWID) and female sex workers (FSW) are thought to significantly contribute to HIV-1 transmission in the wider, mostly heterosexual (HET) HIV-1 transmission network. However, clear data on HIV-1 transmission dynamics within and between these groups are limited. We aimed to empirically quantify rates of HIV-1 flow between key populations and the HET population, as well as between different geographic regions to determine HIV-1 'hotspots' and their contribution to HIV-1 transmission in Kenya. We used maximum-likelihood phylogenetic and Bayesian inference to analyse 4058 HIV-1 pol sequences (representing 0.3 per cent of the epidemic in Kenya) sampled 1986-2019 from individuals of different risk groups and regions in Kenya. We found 89 per cent within-risk group transmission and 11 per cent mixing between risk groups, cyclic HIV-1 exchange between adjoining geographic provinces and strong evidence of HIV-1 dissemination from (i) West-to-East (i.e. higher-to-lower HIV-1 prevalence regions), and (ii) heterosexual-to-key populations. Low HIV-1 prevalence regions and key populations are sinks rather than major sources of HIV-1 transmission in Kenya. Targeting key populations in Kenya needs to occur concurrently with strengthening interventions in the general epidemic.
Collapse
Affiliation(s)
- George M Nduva
- Department of Translational Medicine, Lund University, Faculty of Medicine, Lund University, Box 117 SE-221 00 Lund, Sweden
- Kenya Medical Research Institute-Wellcome Trust Research Programme, KEMRI-Center For Geographic Medicine Research, P.O. Box 230-80108, Kilifi, Kenya
| | - Frederick Otieno
- Nyanza Reproductive Health Society, United Mall, P.O. Box 1764, Kisumu, Kenya
| | - Joshua Kimani
- Department of Medical Microbiology, University of Nairobi, P.O. Box 30197-00100, Nairobi, Kenya
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Max Rady College of Medicine, Room 543-745 Bannatyne Avenue, University of Manitoba (Bannatyne campus), Winnipeg MB R3E 0J9, Canada
| | - Elizabeth Wahome
- Kenya Medical Research Institute-Wellcome Trust Research Programme, KEMRI-Center For Geographic Medicine Research, P.O. Box 230-80108, Kilifi, Kenya
| | - Lyle R McKinnon
- Department of Medical Microbiology, University of Nairobi, P.O. Box 30197-00100, Nairobi, Kenya
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Max Rady College of Medicine, Room 543-745 Bannatyne Avenue, University of Manitoba (Bannatyne campus), Winnipeg MB R3E 0J9, Canada
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), Doris Duke Medical Research Institute, Nelson R Mandela School of Medicine, University of KwaZulu-Natal, Private Bag X7, Congella 4013, South Africa
| | - Francois Cholette
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Max Rady College of Medicine, Room 543-745 Bannatyne Avenue, University of Manitoba (Bannatyne campus), Winnipeg MB R3E 0J9, Canada
- National Microbiology Laboratory at the JC Wilt Infectious Diseases Research Centre, Public Health Agency of Canada, 745 Logan Avenue, Winnipeg, Canada
| | - Maxwell Majiwa
- Kenya Medical Research Institute/Center for Global Health Research, KEMRI-CGHR, P.O. Box 20778-00202, Kisumu, Kenya
| | - Moses Masika
- Faculty of Health Sciences 3RD Floor Wing B, KAVI Institute of Clinical Research, University of Nairobi, P.O. Box 19676-00202, Nairobi, Kenya
| | - Gaudensia Mutua
- Faculty of Health Sciences 3RD Floor Wing B, KAVI Institute of Clinical Research, University of Nairobi, P.O. Box 19676-00202, Nairobi, Kenya
| | - Omu Anzala
- Faculty of Health Sciences 3RD Floor Wing B, KAVI Institute of Clinical Research, University of Nairobi, P.O. Box 19676-00202, Nairobi, Kenya
| | - Susan M Graham
- Kenya Medical Research Institute-Wellcome Trust Research Programme, KEMRI-Center For Geographic Medicine Research, P.O. Box 230-80108, Kilifi, Kenya
- Department of Epidemiology, University of Washington, Office of the Chair, UW Box # 351619, Seattle, DC, USA
| | - Larry Gelmon
- Department of Medical Microbiology, University of Nairobi, P.O. Box 30197-00100, Nairobi, Kenya
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Max Rady College of Medicine, Room 543-745 Bannatyne Avenue, University of Manitoba (Bannatyne campus), Winnipeg MB R3E 0J9, Canada
| | - Matt A Price
- IAVI Global Headquarters, 125 Broad Street, 9th Floor, New York, NY 10004, USA
- Department of Epidemiology and Biostatistics, University of California, Mission Hall: Global Health & Clinical Sciences Building, 550 16th Street, 2nd Floor, San Francisco, CA 94158-2549, USA
| | - Adrian D Smith
- Nuffield Department of Medicine, The University of Oxford, Old Road Campus, Headington, Oxford OX3 7BN, UK
| | - Robert C Bailey
- Nyanza Reproductive Health Society, United Mall, P.O. Box 1764, Kisumu, Kenya
- Division of Epidemiology and Biostatistics, University of Illinois at Chicago, 1603 W Taylor St, Chicago, IL 60612, USA
| | - Guy Baele
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Evolutionary and Computational Virology, Rega-Herestraat 49-box 1040, Leuven 3000, Belgium
| | - Philippe Lemey
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Evolutionary and Computational Virology, Rega-Herestraat 49-box 1040, Leuven 3000, Belgium
| | - Amin S Hassan
- Department of Translational Medicine, Lund University, Faculty of Medicine, Lund University, Box 117 SE-221 00 Lund, Sweden
- Kenya Medical Research Institute-Wellcome Trust Research Programme, KEMRI-Center For Geographic Medicine Research, P.O. Box 230-80108, Kilifi, Kenya
| | - Eduard J Sanders
- Kenya Medical Research Institute-Wellcome Trust Research Programme, KEMRI-Center For Geographic Medicine Research, P.O. Box 230-80108, Kilifi, Kenya
- Nuffield Department of Medicine, The University of Oxford, Old Road Campus, Headington, Oxford OX3 7BN, UK
| | - Joakim Esbjörnsson
- Department of Translational Medicine, Lund University, Faculty of Medicine, Lund University, Box 117 SE-221 00 Lund, Sweden
- Nuffield Department of Medicine, The University of Oxford, Old Road Campus, Headington, Oxford OX3 7BN, UK
| |
Collapse
|
27
|
Long JE, Tordoff DM, Reisner SL, Dasgupta S, Mayer KH, Mullins JI, Lama JR, Herbeck JT, Duerr A. HIV transmission patterns among transgender women, their cisgender male partners, and cisgender MSM in Lima, Peru: A molecular epidemiologic and phylodynamic analysis. LANCET REGIONAL HEALTH. AMERICAS 2022; 6:100121. [PMID: 35178526 PMCID: PMC8849555 DOI: 10.1016/j.lana.2021.100121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
BACKGROUND Transgender women (TW) in Peru are disproportionately affected by HIV. The role that cisgender men who have sex with TW (MSTW) and their sexual networks play in TW's risk of acquiring HIV is not well understood. We used HIV sequences from TW, MSTW, and cisgender men who have sex with men (MSM) to examine transmission dynamics between these groups. METHODS We used HIV-1 pol sequences and epidemiologic data collected through three Lima-based studies from 2013 to 2018 (n = 139 TW, n = 25 MSTW, n = 303 MSM). We identified molecular clusters based on pairwise genetic distance and used structured coalescent phylodynamic modeling to estimate transmission patterns between groups. FINDINGS Among 200 participants (43%) found in 62 clusters, the probability of clustering did not differ by group. Both MSM and TW were more likely to cluster with members of their own group than would be expected based on random mixing. Phylodynamic modeling estimated that there was frequent transmission from MSTW to TW (67·9% of transmission from MSTW; 95%CI = 52·8-83·2%) and from TW to MSTW (76·5% of transmissions from TW; 95%CI = 65·5-90·3%). HIV transmission between MSM and TW was estimated to comprise a small proportion of overall transmissions (4·9% of transmissions from MSM, and 11·8% of transmissions from TW), as were transmissions between MSM and MSTW (7·2% of transmissions from MSM, and 32·0% of transmissions from MSTW). INTERPRETATION These results provide quantitative evidence that MSTW play an important role in TW's HIV vulnerability and that MSTW have an HIV transmission network that is largely distinct from MSM.
Collapse
Affiliation(s)
- Jessica E. Long
- Department of Epidemiology, University of Washington, UW Box #, 351619, Seattle, WA 98195, United States
| | - Diana M. Tordoff
- Department of Epidemiology, University of Washington, UW Box #, 351619, Seattle, WA 98195, United States
- Department of Global Health, International Clinical Research Center, University of Washington, Seattle, WA, United States
| | - Sari L. Reisner
- Department of Medicine, Harvard Medical School, Boston, MA, United States
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women's Hospital, Boston, MA, United States
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, United States
- The Fenway Institute, Fenway Health, Boston, MA, United States
| | - Sayan Dasgupta
- Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Kenneth H. Mayer
- Department of Medicine, Harvard Medical School, Boston, MA, United States
- Division of Endocrinology, Diabetes and Hypertension, Brigham and Women's Hospital, Boston, MA, United States
| | - James I. Mullins
- Department of Medicine, University of Washington, Seattle, WA, United States
- Department of Microbiology, University of Washington, Seattle, WA, United States
- Department of Global Health, University of Washington, Seattle, WA, United States
| | | | - Joshua T. Herbeck
- Department of Global Health, International Clinical Research Center, University of Washington, Seattle, WA, United States
| | - Ann Duerr
- Fred Hutchinson Cancer Research Center, Seattle, Washington
| |
Collapse
|
28
|
Kayondo HW, Ssekagiri A, Nabakooza G, Bbosa N, Ssemwanga D, Kaleebu P, Mwalili S, Mango JM, Leigh Brown AJ, Saenz RA, Galiwango R, Kitayimbwa JM. Employing phylogenetic tree shape statistics to resolve the underlying host population structure. BMC Bioinformatics 2021; 22:546. [PMID: 34758743 PMCID: PMC8579572 DOI: 10.1186/s12859-021-04465-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 10/29/2021] [Indexed: 12/24/2022] Open
Abstract
Background Host population structure is a key determinant of pathogen and infectious disease transmission patterns. Pathogen phylogenetic trees are useful tools to reveal the population structure underlying an epidemic. Determining whether a population is structured or not is useful in informing the type of phylogenetic methods to be used in a given study. We employ tree statistics derived from phylogenetic trees and machine learning classification techniques to reveal an underlying population structure. Results In this paper, we simulate phylogenetic trees from both structured and non-structured host populations. We compute eight statistics for the simulated trees, which are: the number of cherries; Sackin, Colless and total cophenetic indices; ladder length; maximum depth; maximum width, and width-to-depth ratio. Based on the estimated tree statistics, we classify the simulated trees as from either a non-structured or a structured population using the decision tree (DT), K-nearest neighbor (KNN) and support vector machine (SVM). We incorporate the basic reproductive number (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$R_0$$\end{document}R0) in our tree simulation procedure. Sensitivity analysis is done to investigate whether the classifiers are robust to different choice of model parameters and to size of trees. Cross-validated results for area under the curve (AUC) for receiver operating characteristic (ROC) curves yield mean values of over 0.9 for most of the classification models. Conclusions Our classification procedure distinguishes well between trees from structured and non-structured populations using the classifiers, the two-sample Kolmogorov-Smirnov, Cucconi and Podgor-Gastwirth tests and the box plots. SVM models were more robust to changes in model parameters and tree size compared to KNN and DT classifiers. Our classification procedure was applied to real -world data and the structured population was revealed with high accuracy of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$92.3\%$$\end{document}92.3% using SVM-polynomial classifier.
Collapse
Affiliation(s)
- Hassan W Kayondo
- Institute of Basic Sciences, Technology and Innovation (PAUSTI), Pan African University, Nairobi, Kenya. .,Department of Mathematics, Makerere University, Kampala, Uganda.
| | - Alfred Ssekagiri
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda
| | - Grace Nabakooza
- Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda.,UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Entebbe, Uganda.,Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| | - Nicholas Bbosa
- Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Deogratius Ssemwanga
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Pontiano Kaleebu
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Samuel Mwalili
- Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| | - John M Mango
- Department of Mathematics, Makerere University, Kampala, Uganda
| | | | | | - Ronald Galiwango
- Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| | - John M Kitayimbwa
- Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| |
Collapse
|
29
|
Marquioni VM, de Aguiar MAM. Modeling neutral viral mutations in the spread of SARS-CoV-2 epidemics. PLoS One 2021; 16:e0255438. [PMID: 34324605 PMCID: PMC8321105 DOI: 10.1371/journal.pone.0255438] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 07/16/2021] [Indexed: 11/18/2022] Open
Abstract
Although traditional models of epidemic spreading focus on the number of infected, susceptible and recovered individuals, a lot of attention has been devoted to integrate epidemic models with population genetics. Here we develop an individual-based model for epidemic spreading on networks in which viruses are explicitly represented by finite chains of nucleotides that can mutate inside the host. Under the hypothesis of neutral evolution we compute analytically the average pairwise genetic distance between all infecting viruses over time. We also derive a mean-field version of this equation that can be added directly to compartmental models such as SIR or SEIR to estimate the genetic evolution. We compare our results with the inferred genetic evolution of SARS-CoV-2 at the beginning of the epidemic in China and found good agreement with the analytical solution of our model. Finally, using genetic distance as a proxy for different strains, we use numerical simulations to show that the lower the connectivity between communities, e.g., cities, the higher the probability of reinfection.
Collapse
Affiliation(s)
- Vitor M. Marquioni
- Instituto de Física “Gleb Wataghin”, Universidade Estadual de Campinas - UNICAMP, Campinas, SP, Brazil
| | - Marcus A. M. de Aguiar
- Instituto de Física “Gleb Wataghin”, Universidade Estadual de Campinas - UNICAMP, Campinas, SP, Brazil
| |
Collapse
|
30
|
MacPherson A, Louca S, McLaughlin A, Joy JB, Pennell MW. Unifying Phylogenetic Birth-Death Models in Epidemiology and Macroevolution. Syst Biol 2021; 71:172-189. [PMID: 34165577 PMCID: PMC8972974 DOI: 10.1093/sysbio/syab049] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 06/09/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
Birth–death stochastic processes are the foundations of many phylogenetic models and are
widely used to make inferences about epidemiological and macroevolutionary dynamics. There
are a large number of birth–death model variants that have been developed; these impose
different assumptions about the temporal dynamics of the parameters and about the sampling
process. As each of these variants was individually derived, it has been difficult to
understand the relationships between them as well as their precise biological and
mathematical assumptions. Without a common mathematical foundation, deriving new models is
nontrivial. Here, we unify these models into a single framework, prove that many
previously developed epidemiological and macroevolutionary models are all special cases of
a more general model, and illustrate the connections between these variants. This
unification includes both models where the process is the same for all lineages and those
in which it varies across types. We also outline a straightforward procedure for deriving
likelihood functions for arbitrarily complex birth–death(-sampling) models that will
hopefully allow researchers to explore a wider array of scenarios than was previously
possible. By rederiving existing single-type birth–death sampling models, we clarify and
synthesize the range of explicit and implicit assumptions made by these models.
[Birth–death processes; epidemiology; macroevolution; phylogenetics; statistical
inference.]
Collapse
Affiliation(s)
- Ailene MacPherson
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
| | - Stilianos Louca
- Department of Biology, University of Oregon, USA.,Institute of Ecology and Evolution, University of Oregon, USA
| | - Angela McLaughlin
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada.,Bioinformatics, University of British Columbia, Vancouver, Canada
| | - Jeffrey B Joy
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada.,Bioinformatics, University of British Columbia, Vancouver, Canada.,Department of Medicine, University of British Columbia, Vancouver, Canada
| | - Matthew W Pennell
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| |
Collapse
|
31
|
Guinat C, Vergne T, Kocher A, Chakraborty D, Paul MC, Ducatez M, Stadler T. What can phylodynamics bring to animal health research? Trends Ecol Evol 2021; 36:837-847. [PMID: 34034912 DOI: 10.1016/j.tree.2021.04.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 04/22/2021] [Accepted: 04/29/2021] [Indexed: 11/18/2022]
Abstract
Infectious diseases are a major burden to global economies, and public and animal health. To date, quantifying the spread of infectious diseases to inform policy making has traditionally relied on epidemiological data collected during epidemics. However, interest has grown in recent phylodynamic techniques to infer pathogen transmission dynamics from genetic data. Here, we provide examples of where this new discipline has enhanced disease management in public health and illustrate how it could be further applied in animal health. In particular, we describe how phylodynamics can address fundamental epidemiological questions, such as inferring key transmission parameters in animal populations and quantifying spillover events at the wildlife-livestock interface, and generate important insights for the design of more effective control strategies.
Collapse
Affiliation(s)
- Claire Guinat
- Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse 26, 4058 Basel, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Timothee Vergne
- IHAP, Université de Toulouse, INRAE, ENVT, 23 Chemin des Capelles, 31300 Toulouse, France
| | - Arthur Kocher
- Transmission, Infection, Diversification & Evolution (tide) group, Max Planck Institute for the Science of Human History, Kahlaische str. 10, 07745 Jena, Germany
| | - Debapryio Chakraborty
- IHAP, Université de Toulouse, INRAE, ENVT, 23 Chemin des Capelles, 31300 Toulouse, France
| | - Mathilde C Paul
- IHAP, Université de Toulouse, INRAE, ENVT, 23 Chemin des Capelles, 31300 Toulouse, France
| | - Mariette Ducatez
- IHAP, Université de Toulouse, INRAE, ENVT, 23 Chemin des Capelles, 31300 Toulouse, France
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Mattenstrasse 26, 4058 Basel, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
32
|
Moncla LH, Black A, DeBolt C, Lang M, Graff NR, Pérez-Osorio AC, Müller NF, Haselow D, Lindquist S, Bedford T. Repeated introductions and intensive community transmission fueled a mumps virus outbreak in Washington State. eLife 2021; 10:e66448. [PMID: 33871357 PMCID: PMC8079146 DOI: 10.7554/elife.66448] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/15/2021] [Indexed: 12/20/2022] Open
Abstract
In 2016/2017, Washington State experienced a mumps outbreak despite high childhood vaccination rates, with cases more frequently detected among school-aged children and members of the Marshallese community. We sequenced 166 mumps virus genomes collected in Washington and other US states, and traced mumps introductions and transmission within Washington. We uncover that mumps was introduced into Washington approximately 13 times, primarily from Arkansas, sparking multiple co-circulating transmission chains. Although age and vaccination status may have impacted transmission, our data set could not quantify their precise effects. Instead, the outbreak in Washington was overwhelmingly sustained by transmission within the Marshallese community. Our findings underscore the utility of genomic data to clarify epidemiologic factors driving transmission and pinpoint contact networks as critical for mumps transmission. These results imply that contact structures and historic disparities may leave populations at increased risk for respiratory virus disease even when a vaccine is effective and widely used.
Collapse
Affiliation(s)
- Louise H Moncla
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
| | - Allison Black
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
- Department of Epidemiology, University of WashingtonSeattleUnited States
| | - Chas DeBolt
- Office of Communicable Disease Epidemiology, Washington State Department of HealthShorelineUnited States
| | - Misty Lang
- Office of Communicable Disease Epidemiology, Washington State Department of HealthShorelineUnited States
| | - Nicholas R Graff
- Office of Communicable Disease Epidemiology, Washington State Department of HealthShorelineUnited States
| | - Ailyn C Pérez-Osorio
- Office of Communicable Disease Epidemiology, Washington State Department of HealthShorelineUnited States
| | - Nicola F Müller
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
| | - Dirk Haselow
- Arkansas Department of HealthLittle RockUnited States
| | - Scott Lindquist
- Office of Communicable Disease Epidemiology, Washington State Department of HealthShorelineUnited States
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
- Department of Epidemiology, University of WashingtonSeattleUnited States
| |
Collapse
|
33
|
Pacioni C, Vaughan TG, Strive T, Campbell S, Ramsey DSL. Field validation of phylodynamic analytical methods for inference on epidemiological processes in wildlife. Transbound Emerg Dis 2021; 69:1020-1029. [PMID: 33683829 DOI: 10.1111/tbed.14058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 02/24/2021] [Accepted: 03/02/2021] [Indexed: 11/30/2022]
Abstract
Amongst newly developed approaches to analyse molecular data, phylodynamic models are receiving much attention because of their potential to reveal changes to viral populations over short periods. This knowledge can be very important for understanding disease impacts. However, their accuracy needs to be fully understood, especially in relation to wildlife disease epidemiology, where sampling and prior knowledge may be limited. The release of the rabbit haemorrhagic disease virus (RHDV) as biological control in naïve rabbit populations in Australia in 1996 provides a unique data set with which to validate phylodynamic models. By comparing results obtained from RHDV sequence data with our current understanding of RHDV epidemiology in Australia, we evaluated the performances of these recently developed models. In line with our expectations, coalescent analyses detected a sharp increase in the virus population size in the first few months after release, followed by a more gradual increase. Phylodynamic analyses using a birth-death model generated effective reproductive number estimates (the average number of secondary infections per each infectious case, Re ) larger than one for most of the epochs considered. However, the possible range of the initial Re included estimates lower than one despite the known rapid spread of RHDV in Australia. Furthermore, the analyses that accounted for geographical structuring failed to converge. We argue that the difficulties that we encountered most likely stem from the fact that the samples available from 1996 to 2014 were too sparse with respect to both geographic and within outbreak coverage to adequately infer some of the model parameters. In general, while these phylodynamic analyses proved to be greatly informative in some regards, we caution that their interpretation may not be straightforward. We recommend further research to evaluate the robustness of these models to assumption violations and sensitivity to sampling regimes.
Collapse
Affiliation(s)
- Carlo Pacioni
- Arthur Rylah Institute for Environmental Research, Department of Environment, Land, Water and Planning, Heidelberg, VIC, Australia.,School of Veterinary and Life Sciences, Murdoch University, Murdoch, WA, Australia.,Centre for Invasive Species Solutions, Bruce, ACT, Australia
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Tanja Strive
- Centre for Invasive Species Solutions, Bruce, ACT, Australia.,Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT, Australia
| | - Susan Campbell
- Centre for Invasive Species Solutions, Bruce, ACT, Australia.,Department of Primary Industries and Regional Development, Albany, WA, Australia
| | - David S L Ramsey
- Arthur Rylah Institute for Environmental Research, Department of Environment, Land, Water and Planning, Heidelberg, VIC, Australia.,Centre for Invasive Species Solutions, Bruce, ACT, Australia
| |
Collapse
|
34
|
Stadler T, Pybus OG, Stumpf MPH. Phylodynamics for cell biologists. Science 2021; 371:371/6526/eaah6266. [PMID: 33446527 DOI: 10.1126/science.aah6266] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 08/13/2020] [Indexed: 12/12/2022]
Abstract
Multicellular organisms are composed of cells connected by ancestry and descent from progenitor cells. The dynamics of cell birth, death, and inheritance within an organism give rise to the fundamental processes of development, differentiation, and cancer. Technical advances in molecular biology now allow us to study cellular composition, ancestry, and evolution at the resolution of individual cells within an organism or tissue. Here, we take a phylogenetic and phylodynamic approach to single-cell biology. We explain how "tree thinking" is important to the interpretation of the growing body of cell-level data and how ecological null models can benefit statistical hypothesis testing. Experimental progress in cell biology should be accompanied by theoretical developments if we are to exploit fully the dynamical information in single-cell data.
Collapse
Affiliation(s)
- T Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - O G Pybus
- Department of Zoology, University of Oxford, Oxford, UK.
| | - M P H Stumpf
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.
| |
Collapse
|
35
|
Rasmussen DA, Grünwald NJ. Phylogeographic Approaches to Characterize the Emergence of Plant Pathogens. PHYTOPATHOLOGY 2021; 111:68-77. [PMID: 33021879 DOI: 10.1094/phyto-07-20-0319-fi] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Phylogeography combines geographic information with phylogenetic and population genomic approaches to infer the evolutionary history of a species or population in a geographic context. This approach has been instrumental in understanding the emergence, spread, and evolution of a range of plant pathogens. In particular, phylogeography can address questions about where a pathogen originated, whether it is native or introduced, and when and how often introductions occurred. We review the theory, methods, and approaches underpinning phylogeographic inference and highlight applications providing novel insights into the emergence and spread of select pathogens. We hope that this review will be useful in assessing the power, pitfalls, and opportunities presented by various phylogeographic approaches.
Collapse
Affiliation(s)
- David A Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC
| | - Niklaus J Grünwald
- Horticultural Crops Research Unit, U.S. Department of Agriculture-Agricultural Research Service, Corvallis, OR
| |
Collapse
|
36
|
Fountain-Jones NM, Appaw RC, Carver S, Didelot X, Volz E, Charleston M. Emerging phylogenetic structure of the SARS-CoV-2 pandemic. Virus Evol 2020; 6:veaa082. [PMID: 33335743 PMCID: PMC7717445 DOI: 10.1093/ve/veaa082] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Since spilling over into humans, SARS-CoV-2 has rapidly spread across the globe, accumulating significant genetic diversity. The structure of this genetic diversity and whether it reveals epidemiological insights are fundamental questions for understanding the evolutionary trajectory of this virus. Here, we use a recently developed phylodynamic approach to uncover phylogenetic structures underlying the SARS-CoV-2 pandemic. We find support for three SARS-CoV-2 lineages co-circulating, each with significantly different demographic dynamics concordant with known epidemiological factors. For example, Lineage C emerged in Europe with a high growth rate in late February, just prior to the exponential increase in cases in several European countries. Non-synonymous mutations that characterize Lineage C occur in functionally important gene regions responsible for viral replication and cell entry. Even though Lineages A and B had distinct demographic patterns, they were much more difficult to distinguish. Continuous application of phylogenetic approaches to track the evolutionary epidemiology of SARS-CoV-2 lineages will be increasingly important to validate the efficacy of control efforts and monitor significant evolutionary events in the future.
Collapse
Affiliation(s)
| | - Raima Carol Appaw
- School of Natural Sciences, University of Tasmania, Hobart, 7001, Australia
| | - Scott Carver
- School of Natural Sciences, University of Tasmania, Hobart, 7001, Australia
| | - Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV47AL, UK
| | - Erik Volz
- Department of Infectious Disease Epidemiology, MRC Centre for Global Infectious Disease Analysis, Imperial College London, London W2 1PG, UK
| | - Michael Charleston
- School of Natural Sciences, University of Tasmania, Hobart, 7001, Australia
| |
Collapse
|
37
|
Miller D, Martin MA, Harel N, Tirosh O, Kustin T, Meir M, Sorek N, Gefen-Halevi S, Amit S, Vorontsov O, Shaag A, Wolf D, Peretz A, Shemer-Avni Y, Roif-Kaminsky D, Kopelman NM, Huppert A, Koelle K, Stern A. Full genome viral sequences inform patterns of SARS-CoV-2 spread into and within Israel. Nat Commun 2020; 11:5518. [PMID: 33139704 DOI: 10.1101/2020.05.21.20104521] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 10/02/2020] [Indexed: 05/22/2023] Open
Abstract
Full genome sequences are increasingly used to track the geographic spread and transmission dynamics of viral pathogens. Here, with a focus on Israel, we sequence 212 SARS-CoV-2 sequences and use them to perform a comprehensive analysis to trace the origins and spread of the virus. We find that travelers returning from the United States of America significantly contributed to viral spread in Israel, more than their proportion in incoming infected travelers. Using phylodynamic analysis, we estimate that the basic reproduction number of the virus was initially around 2.5, dropping by more than two-thirds following the implementation of social distancing measures. We further report high levels of transmission heterogeneity in SARS-CoV-2 spread, with between 2-10% of infected individuals resulting in 80% of secondary infections. Overall, our findings demonstrate the effectiveness of social distancing measures for reducing viral spread.
Collapse
MESH Headings
- Adolescent
- Adult
- Aged
- Aged, 80 and over
- Base Sequence
- Basic Reproduction Number/statistics & numerical data
- Betacoronavirus/genetics
- COVID-19
- Child
- Child, Preschool
- Communicable Diseases, Imported/epidemiology
- Communicable Diseases, Imported/virology
- Coronavirus Infections/epidemiology
- Coronavirus Infections/prevention & control
- Coronavirus Infections/transmission
- Female
- Genome, Viral/genetics
- Humans
- Infant
- Infant, Newborn
- Israel/epidemiology
- Male
- Middle Aged
- Pandemics/prevention & control
- Phylogeny
- Pneumonia, Viral/epidemiology
- Pneumonia, Viral/prevention & control
- Pneumonia, Viral/transmission
- Psychological Distance
- RNA, Viral/genetics
- SARS-CoV-2
- Sequence Analysis, RNA
- United States
- Young Adult
Collapse
Affiliation(s)
- Danielle Miller
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michael A Martin
- Department of Biology, Emory University, Atlanta, GA, USA
- Population Biology, Ecology, and Evolution Graduate Program, Laney Graduate School, Emory University, Atlanta, GA, USA
| | - Noam Harel
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Omer Tirosh
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Talia Kustin
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Moran Meir
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Nadav Sorek
- Microbiology Laboratory, Assuta Ashdod University-Affiliated Hospital, Ashdod, Israel
| | | | - Sharon Amit
- Clinical Microbiology Laboratory, Sheba Medical Center, Ramat-Gan, Israel
| | - Olesya Vorontsov
- Clinical Virology Unit, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Avraham Shaag
- Clinical Virology Unit, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Dana Wolf
- Clinical Virology Unit, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Avi Peretz
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
- Clinical Microbiology Laboratory, The Baruch Padeh Medical Center, Poriya, Tiberias, Israel
| | - Yonat Shemer-Avni
- Clinical Virology Laboratory, Soroka Medical Center and the Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | | | - Naama M Kopelman
- Department of Computer Science, Holon Institute of Technology, Holon, Israel
| | - Amit Huppert
- Bio-statistical and Bio-mathematical Unit, The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, 52621, Tel Hashomer, Israel
- School of Public Health, The Sackler Faculty of Medicine, Tel-Aviv University, 69978, Tel Aviv, Israel
| | - Katia Koelle
- Department of Biology, Emory University, Atlanta, GA, USA
- Emory-UGA Center of Excellence of Influenza Research and Surveillance (CEIRS), Atlanta, GA, USA
| | - Adi Stern
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
- Edmond J. Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
38
|
Miller D, Martin MA, Harel N, Tirosh O, Kustin T, Meir M, Sorek N, Gefen-Halevi S, Amit S, Vorontsov O, Shaag A, Wolf D, Peretz A, Shemer-Avni Y, Roif-Kaminsky D, Kopelman NM, Huppert A, Koelle K, Stern A. Full genome viral sequences inform patterns of SARS-CoV-2 spread into and within Israel. Nat Commun 2020; 11:5518. [PMID: 33139704 PMCID: PMC7606475 DOI: 10.1038/s41467-020-19248-0] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 10/02/2020] [Indexed: 12/18/2022] Open
Abstract
Full genome sequences are increasingly used to track the geographic spread and transmission dynamics of viral pathogens. Here, with a focus on Israel, we sequence 212 SARS-CoV-2 sequences and use them to perform a comprehensive analysis to trace the origins and spread of the virus. We find that travelers returning from the United States of America significantly contributed to viral spread in Israel, more than their proportion in incoming infected travelers. Using phylodynamic analysis, we estimate that the basic reproduction number of the virus was initially around 2.5, dropping by more than two-thirds following the implementation of social distancing measures. We further report high levels of transmission heterogeneity in SARS-CoV-2 spread, with between 2-10% of infected individuals resulting in 80% of secondary infections. Overall, our findings demonstrate the effectiveness of social distancing measures for reducing viral spread.
Collapse
MESH Headings
- Adolescent
- Adult
- Aged
- Aged, 80 and over
- Base Sequence
- Basic Reproduction Number/statistics & numerical data
- Betacoronavirus/genetics
- COVID-19
- Child
- Child, Preschool
- Communicable Diseases, Imported/epidemiology
- Communicable Diseases, Imported/virology
- Coronavirus Infections/epidemiology
- Coronavirus Infections/prevention & control
- Coronavirus Infections/transmission
- Female
- Genome, Viral/genetics
- Humans
- Infant
- Infant, Newborn
- Israel/epidemiology
- Male
- Middle Aged
- Pandemics/prevention & control
- Phylogeny
- Pneumonia, Viral/epidemiology
- Pneumonia, Viral/prevention & control
- Pneumonia, Viral/transmission
- Psychological Distance
- RNA, Viral/genetics
- SARS-CoV-2
- Sequence Analysis, RNA
- United States
- Young Adult
Collapse
Affiliation(s)
- Danielle Miller
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michael A Martin
- Department of Biology, Emory University, Atlanta, GA, USA
- Population Biology, Ecology, and Evolution Graduate Program, Laney Graduate School, Emory University, Atlanta, GA, USA
| | - Noam Harel
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Omer Tirosh
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Talia Kustin
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Moran Meir
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Nadav Sorek
- Microbiology Laboratory, Assuta Ashdod University-Affiliated Hospital, Ashdod, Israel
| | | | - Sharon Amit
- Clinical Microbiology Laboratory, Sheba Medical Center, Ramat-Gan, Israel
| | - Olesya Vorontsov
- Clinical Virology Unit, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Avraham Shaag
- Clinical Virology Unit, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Dana Wolf
- Clinical Virology Unit, Hadassah Hebrew University Medical Center, Jerusalem, Israel
| | - Avi Peretz
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
- Clinical Microbiology Laboratory, The Baruch Padeh Medical Center, Poriya, Tiberias, Israel
| | - Yonat Shemer-Avni
- Clinical Virology Laboratory, Soroka Medical Center and the Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | | | - Naama M Kopelman
- Department of Computer Science, Holon Institute of Technology, Holon, Israel
| | - Amit Huppert
- Bio-statistical and Bio-mathematical Unit, The Gertner Institute for Epidemiology and Health Policy Research, Chaim Sheba Medical Center, 52621, Tel Hashomer, Israel
- School of Public Health, The Sackler Faculty of Medicine, Tel-Aviv University, 69978, Tel Aviv, Israel
| | - Katia Koelle
- Department of Biology, Emory University, Atlanta, GA, USA
- Emory-UGA Center of Excellence of Influenza Research and Surveillance (CEIRS), Atlanta, GA, USA
| | - Adi Stern
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
- Edmond J. Safra Center for Bioinformatics, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
39
|
What Should Health Departments Do with HIV Sequence Data? Viruses 2020; 12:v12091018. [PMID: 32932642 PMCID: PMC7551807 DOI: 10.3390/v12091018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/09/2020] [Accepted: 09/11/2020] [Indexed: 11/27/2022] Open
Abstract
Many countries and US states have mandatory statues that require reporting of HIV clinical data including genetic sequencing results to the public health departments. Because genetic sequencing is a part of routine care for HIV infected persons, health departments have extensive sequence collections spanning years and even decades of the HIV epidemic. How should these data be used (or not) in public health practice? This is a complex, multi-faceted question that weighs personal risks against public health benefit. The answer is neither straightforward nor universal. However, to make that judgement—of how genetic sequence data should be used in describing and combating the HIV epidemic—we need a clear image of what a phylogenetically enhanced HIV surveillance system can do and what benefit it might provide. In this paper, we present a positive case for how up-to-date analysis of HIV sequence databases managed by health departments can provide unique and actionable information of how HIV is spreading in local communities. We discuss this question broadly, with examples from the US, as it is globally relevant for all health authorities that collect HIV genetic data.
Collapse
|
40
|
Hayati M, Biller P, Colijn C. Predicting the short-term success of human influenza virus variants with machine learning. Proc Biol Sci 2020; 287:20200319. [PMID: 32259469 PMCID: PMC7209065 DOI: 10.1098/rspb.2020.0319] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 03/16/2020] [Indexed: 12/13/2022] Open
Abstract
Seasonal influenza viruses are constantly changing and produce a different set of circulating strains each season. Small genetic changes can accumulate over time and result in antigenically different viruses; this may prevent the body's immune system from recognizing those viruses. Due to rapid mutations, in particular, in the haemagglutinin (HA) gene, seasonal influenza vaccines must be updated frequently. This requires choosing strains to include in the updates to maximize the vaccines' benefits, according to estimates of which strains will be circulating in upcoming seasons. This is a challenging prediction task. In this paper, we use longitudinally sampled phylogenetic trees based on HA sequences from human influenza viruses, together with counts of epitope site polymorphisms in HA, to predict which influenza virus strains are likely to be successful. We extract small groups of taxa (subtrees) and use a suite of features of these subtrees as key inputs to the machine learning tools. Using a range of training and testing strategies, including training on H3N2 and testing on H1N1, we find that successful prediction of future expansion of small subtrees is possible from these data, with accuracies of 0.71-0.85 and a classifier 'area under the curve' 0.75-0.9.
Collapse
Affiliation(s)
- Maryam Hayati
- Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
| | - Priscila Biller
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
- Department of Mathematics, Imperial College London, London SW7 2BU, UK
| |
Collapse
|
41
|
Vaughan TG, Leventhal GE, Rasmussen DA, Drummond AJ, Welch D, Stadler T. Estimating Epidemic Incidence and Prevalence from Genomic Data. Mol Biol Evol 2020; 36:1804-1816. [PMID: 31058982 PMCID: PMC6681632 DOI: 10.1093/molbev/msz106] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Modern phylodynamic methods interpret an inferred phylogenetic tree as a partial transmission chain providing information about the dynamic process of transmission and removal (where removal may be due to recovery, death, or behavior change). Birth–death and coalescent processes have been introduced to model the stochastic dynamics of epidemic spread under common epidemiological models such as the SIS and SIR models and are successfully used to infer phylogenetic trees together with transmission (birth) and removal (death) rates. These methods either integrate analytically over past incidence and prevalence to infer rate parameters, and thus cannot explicitly infer past incidence or prevalence, or allow such inference only in the coalescent limit of large population size. Here, we introduce a particle filtering framework to explicitly infer prevalence and incidence trajectories along with phylogenies and epidemiological model parameters from genomic sequences and case count data in a manner consistent with the underlying birth–death model. After demonstrating the accuracy of this method on simulated data, we use it to assess the prevalence through time of the early 2014 Ebola outbreak in Sierra Leone.
Collapse
Affiliation(s)
- Timothy G Vaughan
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Gabriel E Leventhal
- Institute of Integrative Biology, ETH Zürich, Zurich, Switzerland.,Department of Civil and Environmental Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA
| | - David A Rasmussen
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC.,Bioinformatics Research Center, North Carolina State University, Raleigh, NC
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,School of Computer Science, University of Auckland, Auckland, New Zealand
| | - David Welch
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,School of Computer Science, University of Auckland, Auckland, New Zealand
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
42
|
Nascimento FF, Baral S, Geidelberg L, Mukandavire C, Schwartz SR, Turpin G, Turpin N, Diouf D, Diouf NL, Coly K, Kane CT, Ndour C, Vickerman P, Boily MC, Volz EM. Phylodynamic analysis of HIV-1 subtypes B, C and CRF 02_AG in Senegal. Epidemics 2019; 30:100376. [PMID: 31767497 PMCID: PMC10066795 DOI: 10.1016/j.epidem.2019.100376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 10/28/2019] [Accepted: 11/04/2019] [Indexed: 01/12/2023] Open
Abstract
Surveillance of HIV epidemics in key populations and in developing countries is often challenging due to sparse, incomplete, or low-quality data. Analysis of HIV sequence data can provide an alternative source of information about epidemic history, population structure, and transmission patterns. To understand HIV-1 dynamics and transmission patterns in Senegal, we carried out model-based phylodynamic analyses using the structured-coalescent approach using HIV-1 sequence data from three different subgroups: reproductive aged males and females from the adult Senegalese population and men who have sex with other men (MSM). We fitted these phylodynamic analyses to time-scaled phylogenetic trees individually for subtypes C and CRF 02_AG, and for the combined data for subtypes B, C and CRF 02_AG. In general, the combined analysis showed a decreasing proportion of effective number of infections among all reproductive aged adults relative to MSM. However, we observed a nearly time-invariant distribution for subtype CRF 02_AG and an increasing trend for subtype C on the proportion of effective number of infections. The population attributable fraction also differed between analyses: subtype CRF 02_AG showed little contribution from MSM, while for subtype C and combined analyses this contribution was much higher. Despite observed differences, results suggested that the combination of high assortativity among MSM and the unmet HIV prevention and treatment needs represent a significant component of the HIV epidemic in Senegal.
Collapse
Affiliation(s)
- Fabrícia F Nascimento
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place W2 1PG, UK
| | - Stefan Baral
- Department of Epidemiology, Johns Hopkins School of Public Health, Baltimore, MD, USA
| | - Lily Geidelberg
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place W2 1PG, UK
| | - Christinah Mukandavire
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place W2 1PG, UK
| | - Sheree R Schwartz
- Department of Epidemiology, Johns Hopkins School of Public Health, Baltimore, MD, USA
| | - Gnilane Turpin
- Department of Epidemiology, Johns Hopkins School of Public Health, Baltimore, MD, USA
| | | | | | - Nafissatou Leye Diouf
- Institut de Recherche en Santé, de Surveillance Epidemiologique et de Formations, Dakar, Senegal
| | - Karleen Coly
- Department of Epidemiology, Johns Hopkins School of Public Health, Baltimore, MD, USA
| | - Coumba Toure Kane
- Institut de Recherche en Santé, de Surveillance Epidemiologique et de Formations, Dakar, Senegal
| | - Cheikh Ndour
- Division de La Lutte Contre Le Sida et Les IST, Ministry of Health, Dakar, Senegal
| | - Peter Vickerman
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Marie-Claude Boily
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place W2 1PG, UK
| | - Erik M Volz
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place W2 1PG, UK; MRC Centre for Global Infectious Disease Analysis, Imperial College London, UK.
| |
Collapse
|
43
|
Müller NF, Rasmussen D, Stadler T. MASCOT: parameter and state inference under the marginal structured coalescent approximation. Bioinformatics 2019; 34:3843-3848. [PMID: 29790921 PMCID: PMC6223361 DOI: 10.1093/bioinformatics/bty406] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 05/16/2018] [Indexed: 11/16/2022] Open
Abstract
Motivation The structured coalescent is widely applied to study demography within and migration between sub-populations from genetic sequence data. Current methods are either exact but too computationally inefficient to analyse large datasets with many sub-populations, or make strong approximations leading to severe biases in inference. We recently introduced an approximation based on weaker assumptions to the structured coalescent enabling the analysis of larger datasets with many different states. We showed that our approximation provides unbiased migration rate and population size estimates across a wide parameter range. Results We extend this approach by providing a new algorithm to calculate the probability of the state of internal nodes that includes the information from the full phylogenetic tree. We show that this algorithm is able to increase the probability attributed to the true sub-population of a node. Furthermore we use improved integration techniques, such that our method is now able to analyse larger datasets, including a H3N2 dataset with 433 sequences sampled from five different locations. Availability and implementation The presented methods are part of the BEAST2 package MASCOT, the Marginal Approximation of the Structured COalescenT. This package can be downloaded via the BEAUti package manager. The source code is available at https://github.com/nicfel/Mascot.git. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicola F Müller
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - David Rasmussen
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.,Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, USA.,Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
44
|
Volz EM, Le Vu S, Ratmann O, Tostevin A, Dunn D, Orkin C, O'Shea S, Delpech V, Brown A, Gill N, Fraser C. Molecular Epidemiology of HIV-1 Subtype B Reveals Heterogeneous Transmission Risk: Implications for Intervention and Control. J Infect Dis 2019; 217:1522-1529. [PMID: 29506269 PMCID: PMC5913615 DOI: 10.1093/infdis/jiy044] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 01/22/2018] [Indexed: 11/25/2022] Open
Abstract
Background The impact of HIV pre-exposure prophylaxis (PrEP) depends on infections averted by protecting vulnerable individuals as well as infections averted by preventing transmission by those who would have been infected if not receiving PrEP. Analysis of HIV phylogenies reveals risk factors for transmission, which we examine as potential criteria for allocating PrEP. Methods We analyzed 6912 HIV-1 partial pol sequences from men who have sex with men (MSM) in the United Kingdom combined with global reference sequences and patient-level metadata. Population genetic models were developed that adjust for stage of infection, global migration of HIV lineages, and changing incidence of infection through time. Models were extended to simulate the effects of providing susceptible MSM with PrEP. Results We found that young age <25 years confers higher risk of HIV transmission (relative risk = 2.52 [95% confidence interval, 2.32–2.73]) and that young MSM are more likely to transmit to one another than expected by chance. Simulated interventions indicate that 4-fold more infections can be averted over 5 years by focusing PrEP on young MSM. Conclusions Concentrating PrEP doses on young individuals can avert more infections than random allocation.
Collapse
Affiliation(s)
- Erik M Volz
- Department of Infectious Disease Epidemiology and the National Institute for Health Research Health Protection Research Unit on Modeling Methodology, Imperial College London
| | - Stephane Le Vu
- Department of Infectious Disease Epidemiology and the National Institute for Health Research Health Protection Research Unit on Modeling Methodology, Imperial College London
| | - Oliver Ratmann
- Department of Infectious Disease Epidemiology and the National Institute for Health Research Health Protection Research Unit on Modeling Methodology, Imperial College London
| | - Anna Tostevin
- Institute for Global Health, University College London
| | - David Dunn
- Institute for Global Health, University College London
| | | | - Siobhan O'Shea
- Infection Sciences, Viapath Analytics, Guy's and St Thomas' NHS Foundation Trust, London
| | | | | | | | - Christophe Fraser
- Li Ka Shing Centre for Health Information and Discovery, Oxford University, United Kingdom
| | | |
Collapse
|
45
|
Müller NF, Dudas G, Stadler T. Inferring time-dependent migration and coalescence patterns from genetic sequence and predictor data in structured populations. Virus Evol 2019; 5:vez030. [PMID: 31428459 PMCID: PMC6693038 DOI: 10.1093/ve/vez030] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Population dynamics can be inferred from genetic sequence data by using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume migration rates and effective population sizes to be constant through time in structured populations. When considering rates to vary through time in structured populations, the number of parameters to infer increases rapidly and the available data might not be sufficient to inform these. Additionally, it is often of interest to know what predicts these parameters rather than knowing the parameters themselves. Here, we introduce a method to infer the predictors for time-varying migration rates and effective population sizes by using a generalized linear model (GLM) approach under the marginal approximation of the structured coalescent. Using simulations, we show that our approach is able to reliably infer the model parameters and its predictors from phylogenetic trees. Furthermore, when simulating trees under the structured coalescent, we show that our new approach outperforms the discrete trait GLM model. We then apply our framework to a previously described Ebola virus dataset, where we infer the parameters and its predictors from genome sequences while accounting for phylogenetic uncertainty. We infer weekly cases to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This approach is implemented as part of the BEAST2 package MASCOT, which allows us to jointly infer population dynamics, i.e. the parameters and predictors, within structured populations, the phylogenetic tree, and evolutionary parameters.
Collapse
Affiliation(s)
- Nicola F Müller
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Gytis Dudas
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
46
|
Abstract
A variety of methods based on coalescent theory have been developed to infer demographic history from gene sequences sampled from natural populations. The 'skyline plot' and related approaches are commonly employed as flexible prior distributions for phylogenetic trees in the Bayesian analysis of pathogen gene sequences. In this work we extend the classic and generalized skyline plot methods to phylogenies that contain one or more multifurcations (i.e. hard polytomies). We use the theory of Λ-coalescents (specifically, Beta ( 2 - α , α ) -coalescents) to develop the 'multifurcating skyline plot', which estimates a piecewise constant function of effective population size through time, conditional on a time-scaled multifurcating phylogeny. We implement a smoothing procedure and extend the method to serially sampled (heterochronous) data, but we do not address here the problem of estimating trees with multifurcations from gene sequence alignments. We validate our estimator on simulated data using maximum likelihood and find that parameters of the Beta ( 2 - α , α ) -coalescent process can be estimated accurately. Furthermore, we apply the multifurcating skyline plot to simulated trees generated by tracking transmissions in an individual-based model of epidemic superspreading. We find that high levels of superspreading are consistent with the high-variance assumptions underlying Λ-coalescents and that the estimated parameters of the Λ-coalescent model contain information about the degree of superspreading.
Collapse
Affiliation(s)
- Patrick Hoscheit
- MaIAGE, INRA, Université Paris-Saclay, Domaine de Vilvert, Jouy-en-Josas 78350, France
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Peter Medawar Building, South Parks Road, Oxford OX1 3SY, UK
| |
Collapse
|
47
|
Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, Matschiner M, Mendes FK, Müller NF, Ogilvie HA, du Plessis L, Popinga A, Rambaut A, Rasmussen D, Siveroni I, Suchard MA, Wu CH, Xie D, Zhang C, Stadler T, Drummond AJ. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2019; 15:e1006650. [PMID: 30958812 PMCID: PMC6472827 DOI: 10.1371/journal.pcbi.1006650] [Citation(s) in RCA: 1994] [Impact Index Per Article: 332.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 04/18/2019] [Accepted: 02/04/2019] [Indexed: 11/18/2022] Open
Abstract
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
Collapse
Affiliation(s)
- Remco Bouckaert
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Timothy G. Vaughan
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joëlle Barido-Sottani
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sebastián Duchêne
- Department of Biochemistry and Molecular Biology, University of Melbourne, Melbourne, Victoria, Australia
| | - Mathieu Fourment
- ithree institute, University of Technology Sydney, Sydney, Australia
| | | | | | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE 405 30 Göteborg, Sweden
| | - Denise Kühnert
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridgeshire, UK
| | - Michael Matschiner
- Department of Environmental Sciences, University of Basel, 4051 Basel, Switzerland
| | - Fábio K. Mendes
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Nicola F. Müller
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Huw A. Ogilvie
- Department of Computer Science, Rice University, Houston, TX 77005-1892, USA
| | - Louis du Plessis
- Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK
| | - Alex Popinga
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, EH9 3FL UK
| | - David Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27695, USA
| | - Igor Siveroni
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Chieh-Hsi Wu
- Department of Statistics, University of Oxford, OX1 3LB, UK
| | - Dong Xie
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Chi Zhang
- Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Tanja Stadler
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexei J. Drummond
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|
48
|
Volz EM, Siveroni I. Bayesian phylodynamic inference with complex models. PLoS Comput Biol 2018; 14:e1006546. [PMID: 30422979 PMCID: PMC6258546 DOI: 10.1371/journal.pcbi.1006546] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 11/27/2018] [Accepted: 10/05/2018] [Indexed: 12/20/2022] Open
Abstract
Population genetic modeling can enhance Bayesian phylogenetic inference by providing a realistic prior on the distribution of branch lengths and times of common ancestry. The parameters of a population genetic model may also have intrinsic importance, and simultaneous estimation of a phylogeny and model parameters has enabled phylodynamic inference of population growth rates, reproduction numbers, and effective population size through time. Phylodynamic inference based on pathogen genetic sequence data has emerged as useful supplement to epidemic surveillance, however commonly-used mechanistic models that are typically fitted to non-genetic surveillance data are rarely fitted to pathogen genetic data due to a dearth of software tools, and the theory required to conduct such inference has been developed only recently. We present a framework for coalescent-based phylogenetic and phylodynamic inference which enables highly-flexible modeling of demographic and epidemiological processes. This approach builds upon previous structured coalescent approaches and includes enhancements for computational speed, accuracy, and stability. A flexible markup language is described for translating parametric demographic or epidemiological models into a structured coalescent model enabling simultaneous estimation of demographic or epidemiological parameters and time-scaled phylogenies. We demonstrate the utility of these approaches by fitting compartmental epidemiological models to Ebola virus and Influenza A virus sequence data, demonstrating how important features of these epidemics, such as the reproduction number and epidemic curves, can be gleaned from genetic data. These approaches are provided as an open-source package PhyDyn for the BEAST2 phylogenetics platform.
Collapse
Affiliation(s)
- Erik M. Volz
- Department of Infectious Disease Epidemiology and the MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom
| | - Igor Siveroni
- Department of Infectious Disease Epidemiology and the MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom
| |
Collapse
|
49
|
Volz EM, Didelot X. Modeling the Growth and Decline of Pathogen Effective Population Size Provides Insight into Epidemic Dynamics and Drivers of Antimicrobial Resistance. Syst Biol 2018; 67:719-728. [PMID: 29432602 PMCID: PMC6005154 DOI: 10.1093/sysbio/syy007] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 02/04/2018] [Indexed: 12/15/2022] Open
Abstract
Nonparametric population genetic modeling provides a simple and flexible approach for studying demographic history and epidemic dynamics using pathogen sequence data. Existing Bayesian approaches are premised on stochastic processes with stationary increments which may provide an unrealistic prior for epidemic histories which feature extended period of exponential growth or decline. We show that nonparametric models defined in terms of the growth rate of the effective population size can provide a more realistic prior for epidemic history. We propose a nonparametric autoregressive model on the growth rate as a prior for effective population size, which corresponds to the dynamics expected under many epidemic situations. We demonstrate the use of this model within a Bayesian phylodynamic inference framework. Our method correctly reconstructs trends of epidemic growth and decline from pathogen genealogies even when genealogical data are sparse and conventional skyline estimators erroneously predict stable population size. We also propose a regression approach for relating growth rates of pathogen effective population size and time-varying variables that may impact the replicative fitness of a pathogen. The model is applied to real data from rabies virus and Staphylococcus aureus epidemics. We find a close correspondence between the estimated growth rates of a lineage of methicillin-resistant S. aureus and population-level prescription rates of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\beta$\end{document}-lactam antibiotics. The new models are implemented in an open source R package called skygrowth which is available at https://github.com/mrc-ide/skygrowth.
Collapse
Affiliation(s)
- Erik M Volz
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| | - Xavier Didelot
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| |
Collapse
|
50
|
Malunguza NJ, Hove-Musekwa SD, Dube S, Mukandavire Z. Dynamical properties and thresholds of an HIV model with super-infection. MATHEMATICAL MEDICINE AND BIOLOGY-A JOURNAL OF THE IMA 2018; 34:493-522. [PMID: 27672183 DOI: 10.1093/imammb/dqw014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 08/15/2016] [Indexed: 11/13/2022]
Abstract
Super-infection by multiple HIV-1 subtypes, previously thought restricted to high risk groups, has now been reported in the general heterosexual populations at relatively the same incidence rate as in high risk groups. We present a simple deterministic HIV model with super-infection by two HIV-1 subtypes. Mathematical characteristics including the basic reproductive number $(\mathcal{R}_0)$, invasion threshold $(\mathcal{R}_{21},\mathcal{R}_{12})$ and conditions for asymptotic stability are derived. In the absence of super-infection the model exhibits competitive exclusion, and all equilibria are globally attracting if they exist except for the disease free which is a saddle for $\mathcal{R}_0>1.$ The results show that the subtype with the dominant reproductive number exceeding unity dominates the weaker subtype forcing it to extinction regardless of the size of the reproductive number. On the other end, super-infection may promote subtype co-existence whenever the minimum of the subtype specific reproductive numbers $(\mathcal{R}_1,\mathcal{R}_2)$ and the invasion reproductive numbers $(\mathcal{R}_{12},\mathcal{R}_{21})$ exceed unity. Our results demonstrate that if the partial reproductive numbers $(\mathcal{R}_1~\mbox{and}~\mathcal{R}_2 )$ and the invasion reproductive number for the weaker subtype $(\mathcal{R}_{21})$ satisfy $\mathcal{R}_2<1,~\mathcal{R}_1>1~\mbox{and}~\mathcal{R}_{21}>1,$ then primary infection by subtype $1$ may stay the extinction of subtype $2$ despite its relatively low reproductive fitness. For certain parameter ranges, hysteresis (including backward bifurcation) occurs with possible differences in the asymptotic level of disease prevalence. Super-infection may thus facilitate the continued re-generation of reproductively noncompetent subtypes whose subtype specific reproductive numbers will be less than unity while at the same time allowing for the mutual coexistence and persistence of multiple strains. Persistence and co-existence of multiple strains has detrimental effect on vaccine design and development and administration of ART where one or more of the strains are drug resistant.
Collapse
Affiliation(s)
- N J Malunguza
- Department of Applied Mathematics, National University of Science and Technology, Bulawayo, Zimbabwe
| | - S D Hove-Musekwa
- Department of Applied Mathematics, National University of Science and Technology, Bulawayo, Zimbabwe
| | - S Dube
- Department of Applied Biology, National University of Science and Technology, Bulawayo, Zimbabwe
| | - Z Mukandavire
- Social and Mathematical Epidemiology Group, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|