1
|
Van der Roest BR, Bootsma MCJ, Fischer EAJ, Klinkenberg D, Kretzschmar MEE. A Bayesian inference method to estimate transmission trees with multiple introductions; applied to SARS-CoV-2 in Dutch mink farms. PLoS Comput Biol 2023; 19:e1010928. [PMID: 38011266 PMCID: PMC10703282 DOI: 10.1371/journal.pcbi.1010928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 12/07/2023] [Accepted: 11/12/2023] [Indexed: 11/29/2023] Open
Abstract
Knowledge of who infected whom during an outbreak of an infectious disease is important to determine risk factors for transmission and to design effective control measures. Both whole-genome sequencing of pathogens and epidemiological data provide useful information about the transmission events and underlying processes. Existing models to infer transmission trees usually assume that the pathogen is introduced only once from outside into the population of interest. However, this is not always true. For instance, SARS-CoV-2 is suggested to be introduced multiple times in mink farms in the Netherlands from the SARS-CoV-2 pandemic among humans. Here, we developed a Bayesian inference method combining whole-genome sequencing data and epidemiological data, allowing for multiple introductions of the pathogen in the population. Our method does not a priori split the outbreak into multiple phylogenetic clusters, nor does it break the dependency between the processes of mutation, within-host dynamics, transmission, and observation. We implemented our method as an additional feature in the R-package phybreak. On simulated data, our method correctly identifies the number of introductions, with an accuracy depending on the proportion of all observed cases that are introductions. Moreover, when a single introduction was simulated, our method produced similar estimates of parameters and transmission trees as the existing package. When applied to data from a SARS-CoV-2 outbreak in Dutch mink farms, the method provides strong evidence for independent introductions of the pathogen at 13 farms, infecting a total of 63 farms. Using the new feature of the phybreak package, transmission routes of a more complex class of infectious disease outbreaks can be inferred which will aid infection control in future outbreaks.
Collapse
Affiliation(s)
- Bastiaan R. Van der Roest
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Martin C. J. Bootsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Department of Mathematics, Faculty of Science, Utrecht University, Utrecht, Netherlands
| | - Egil A. J. Fischer
- Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands
| | - Don Klinkenberg
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Mirjam E. E. Kretzschmar
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| |
Collapse
|
2
|
Ward C, Brown GD, Oleson JJ. Incorporating infectious duration-dependent transmission into Bayesian epidemic models. Biom J 2023; 65:e2100401. [PMID: 36285663 DOI: 10.1002/bimj.202100401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 09/02/2022] [Accepted: 09/13/2022] [Indexed: 11/11/2022]
Abstract
Compartmental models are commonly used to describe the spread of infectious diseases by estimating the probabilities of transitions between important disease states. A significant challenge in fitting Bayesian compartmental models lies in the need to estimate the duration of the infectious period, based on limited data providing only symptom onset date or another proxy for the start of infectiousness. Commonly, the exponential distribution is used to describe the infectious duration, an overly simplistic approach, which is not biologically plausible. More flexible distributions can be used, but parameter identifiability and computational cost can worsen for moderately sized or large epidemics. In this article, we present a novel approach, which considers a curve of transmissibility over a fixed infectious duration. The incorporation of infectious duration-dependent (IDD) transmissibility, which decays to zero during the infectious period, is biologically reasonable for many viral infections and fixing the length of the infectious period eases computational complexity in model fitting. Through simulation, we evaluate different functional forms of IDD transmissibility curves and show that the proposed approach offers improved estimation of the time-varying reproductive number. We illustrate the benefit of our approach through a new analysis of the 1995 outbreak of Ebola Virus Disease in the Democratic Republic of the Congo.
Collapse
Affiliation(s)
- Caitlin Ward
- Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA
| | - Grant D Brown
- Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA
| | - Jacob J Oleson
- Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
3
|
Eck DJ, Morozova O, Crawford FW. Randomization for the susceptibility effect of an infectious disease intervention. J Math Biol 2022; 85:37. [PMID: 36127558 PMCID: PMC9809173 DOI: 10.1007/s00285-022-01801-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 06/07/2022] [Accepted: 07/05/2022] [Indexed: 01/05/2023]
Abstract
Randomized trials of infectious disease interventions, such as vaccines, often focus on groups of connected or potentially interacting individuals. When the pathogen of interest is transmissible between study subjects, interference may occur: individual infection outcomes may depend on treatments received by others. Epidemiologists have defined the primary parameter of interest-called the "susceptibility effect"-as a contrast in infection risk under treatment versus no treatment, while holding exposure to infectiousness constant. A related quantity-the "direct effect"-is defined as an unconditional contrast between the infection risk under treatment versus no treatment. The purpose of this paper is to show that under a widely recommended randomization design, the direct effect may fail to recover the sign of the true susceptibility effect of the intervention in a randomized trial when outcomes are contagious. The analytical approach uses structural features of infectious disease transmission to define the susceptibility effect. A new probabilistic coupling argument reveals stochastic dominance relations between potential infection outcomes under different treatment allocations. The results suggest that estimating the direct effect under randomization may provide misleading conclusions about the effect of an intervention-such as a vaccine-when outcomes are contagious. Investigators who estimate the direct effect may wrongly conclude an intervention that protects treated individuals from infection is harmful, or that a harmful treatment is beneficial.
Collapse
Affiliation(s)
- Daniel J Eck
- Department of Statistics, University of Illinois Urbana-Champaign, Champaign, USA.
| | - Olga Morozova
- Department of Public Health Sciences, Biological Sciences Division, The University of Chicago, Chicago, USA
| | - Forrest W Crawford
- Department of Biostatistics, Yale School of Public Health, New Haven, USA
- Department of Statistics and Data Science, Yale University, New Haven, USA
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, USA
- Yale School of Management, New Haven, USA
| |
Collapse
|
4
|
Engebretsen S, Rø G, de Blasio BF. A compelling demonstration of why traditional statistical regression models cannot be used to identify risk factors from case data on infectious diseases: a simulation study. BMC Med Res Methodol 2022; 22:146. [PMID: 35596137 PMCID: PMC9123765 DOI: 10.1186/s12874-022-01565-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 03/03/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Regression models are often used to explain the relative risk of infectious diseases among groups. For example, overrepresentation of immigrants among COVID-19 cases has been found in multiple countries. Several studies apply regression models to investigate whether different risk factors can explain this overrepresentation among immigrants without considering dependence between the cases. METHODS We study the appropriateness of traditional statistical regression methods for identifying risk factors for infectious diseases, by a simulation study. We model infectious disease spread by a simple, population-structured version of an SIR (susceptible-infected-recovered)-model, which is one of the most famous and well-established models for infectious disease spread. The population is thus divided into different sub-groups. We vary the contact structure between the sub-groups of the population. We analyse the relation between individual-level risk of infection and group-level relative risk. We analyse whether Poisson regression estimators can capture the true, underlying parameters of transmission. We assess both the quantitative and qualitative accuracy of the estimated regression coefficients. RESULTS We illustrate that there is no clear relationship between differences in individual characteristics and group-level overrepresentation -small differences on the individual level can result in arbitrarily high overrepresentation. We demonstrate that individual risk of infection cannot be properly defined without simultaneous specification of the infection level of the population. We argue that the estimated regression coefficients are not interpretable and show that it is not possible to adjust for other variables by standard regression methods. Finally, we illustrate that regression models can result in the significance of variables unrelated to infection risk in the constructed simulation example (e.g. ethnicity), particularly when a large proportion of contacts is within the same group. CONCLUSIONS Traditional regression models which are valid for modelling risk between groups for non-communicable diseases are not valid for infectious diseases. By applying such methods to identify risk factors of infectious diseases, one risks ending up with wrong conclusions. Output from such analyses should therefore be treated with great caution.
Collapse
Affiliation(s)
| | - Gunnar Rø
- Department of Method Development and Analytics, Norwegian Institute of Public Health, Oslo, Norway
| | - Birgitte Freiesleben de Blasio
- Department of Method Development and Analytics, Norwegian Institute of Public Health, Oslo, Norway.,Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| |
Collapse
|
5
|
Farthing TS, Dawson DE, Sanderson MW, Seger H, Lanzas C. Combining epidemiological and ecological methods to quantify social effects on Escherichia coli transmission. R Soc Open Sci 2021; 8:210328. [PMID: 34754493 PMCID: PMC8493196 DOI: 10.1098/rsos.210328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 09/09/2021] [Indexed: 06/13/2023]
Abstract
Enteric microparasites like Escherichia coli use multiple transmission pathways to propagate within and between host populations. Characterizing the relative transmission risk attributable to host social relationships and direct physical contact between individuals is paramount for understanding how microparasites like E. coli spread within affected communities and estimating colonization rates. To measure these effects, we carried out commensal E. coli transmission experiments in two cattle (Bos taurus) herds, wherein all individuals were equipped with real-time location tracking devices. Following transmission experiments in this model system, we derived temporally dynamic social and contact networks from location data. Estimated social affiliations and dyadic contact frequencies during transmission experiments informed pairwise accelerated failure time models that we used to quantify effects of these sociobehavioural variables on weekly E. coli colonization risk in these populations. We found that sociobehavioural variables alone were ultimately poor predictors of E. coli colonization in feedlot cattle, but can have significant effects on colonization hazard rates (p ≤ 0.05). We show, however, that observed effects were not consistent between similar populations. This work demonstrates that transmission experiments can be combined with real-time location data collection and processing procedures to create an effective framework for quantifying sociobehavioural effects on microparasite transmission.
Collapse
Affiliation(s)
- Trevor S. Farthing
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27606, USA
| | - Daniel E. Dawson
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27606, USA
| | - Mike W. Sanderson
- Department of Diagnostic Medicine and Pathobiology, College of Veterinary Medicine, Kansas State University, Manhattan, KS 66506, USA
| | - Hannah Seger
- Department of Diagnostic Medicine and Pathobiology, College of Veterinary Medicine, Kansas State University, Manhattan, KS 66506, USA
| | - Cristina Lanzas
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27606, USA
| |
Collapse
|
6
|
Crawford FW, Marx FM, Zelner J, Cohen T. Transmission Modeling with Regression Adjustment for Analyzing Household-based Studies of Infectious Disease: Application to Tuberculosis. Epidemiology 2020; 31:238-47. [PMID: 31764276 DOI: 10.1097/EDE.0000000000001143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Household contacts of people infected with a transmissible disease may be at risk due to this proximate exposure, or from other unobserved sources. Understanding variation in infection risk is essential for targeting interventions. METHODS We develop an analytical approach to estimate household and exogenous forces of infection, while accounting for individual-level characteristics that affect susceptibility to disease and transmissibility. We apply this approach to a cohort study conducted in Lima, Peru, of 18,544 subjects in 4,500 households with at least one active tuberculosis (TB) case and compare the results to those obtained by Poisson and logistic regression. RESULTS HIV-coinfected (susceptibility hazard ratio [SHR] = 3.80, 1.56-9.29), child (SHR = 1.72, 1.32-2.23), and teenage (SHR = 2.00, 1.49-2.68) household contacts of TB cases experience a higher hazard of TB than do adult contacts. Isoniazid preventive therapy (SHR = 0.30, 0.21-0.42) and Bacillus Calmette-Guérin (BCG) vaccination (SHR = 0.66, 0.51-0.86) reduce the risk of disease among household contacts. TB cases without microbiological confirmation exert a smaller hazard of TB among their close contacts compared with smear- or culture-positive cases (excess hazard ratio = 0.88, 0.82-0.93 for HIV- cases and 0.82, 0.57-0.94 for HIV+ cases). The extra household force of infection results in 0.01 (95% confidence interval [CI] = 0.004, 0.028) TB cases per susceptible household contact per year and the rate of transmission between a microbiologically confirmed TB case and susceptible household contact at 0.08 (95% CI = 0.045, 0.129) TB cases per pair per year. CONCLUSIONS Accounting for exposure to infected household contacts permits estimation of risk factors for disease susceptibility and transmissibility and comparison of within-household and exogenous forces of infection.
Collapse
|
7
|
Sharker Y, Kenah E. Estimating and interpreting secondary attack risk: Binomial considered biased. PLoS Comput Biol 2021; 17:e1008601. [PMID: 33471806 PMCID: PMC7850487 DOI: 10.1371/journal.pcbi.1008601] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 02/01/2021] [Accepted: 12/02/2020] [Indexed: 11/18/2022] Open
Abstract
The household secondary attack risk (SAR), often called the secondary attack rate or secondary infection risk, is the probability of infectious contact from an infectious household member A to a given household member B, where we define infectious contact to be a contact sufficient to infect B if he or she is susceptible. Estimation of the SAR is an important part of understanding and controlling the transmission of infectious diseases. In practice, it is most often estimated using binomial models such as logistic regression, which implicitly attribute all secondary infections in a household to the primary case. In the simplest case, the number of secondary infections in a household with m susceptibles and a single primary case is modeled as a binomial(m, p) random variable where p is the SAR. Although it has long been understood that transmission within households is not binomial, it is thought that multiple generations of transmission can be neglected safely when p is small. We use probability generating functions and simulations to show that this is a mistake. The proportion of susceptible household members infected can be substantially larger than the SAR even when p is small. As a result, binomial estimates of the SAR are biased upward and their confidence intervals have poor coverage probabilities even if adjusted for clustering. Accurate point and interval estimates of the SAR can be obtained using longitudinal chain binomial models or pairwise survival analysis, which account for multiple generations of transmission within households, the ongoing risk of infection from outside the household, and incomplete follow-up. We illustrate the practical implications of these results in an analysis of household surveillance data collected by the Los Angeles County Department of Public Health during the 2009 influenza A (H1N1) pandemic. The household secondary attack risk (SAR), often called the secondary attack rate or secondary infection risk, is the probability of infectious contact from an infectious household member A to a given household member B, where we define infectious contact to be a contact sufficient to infect B if he or she is susceptible. The most common statistical models used to estimate the SAR are binomial models such as logistic regression, which implicitly assume that all secondary infections in a household are infected by the primary case. Here, we use analytical calculations and simulations to show that estimation of the SAR must account for multiple generations of transmission within households. As an example, we show that binomial models and statistical models that account for multiple generations of within-household transmission reach different conclusions about the household SAR for 2009 influenza A (H1N1) in Los Angeles County, with the latter models fitting the data better. In an epidemic, accurate estimation of the SAR allows rigorous evaluation of the effectiveness of public health interventions such as social distancing, prophylaxis or treatment, and vaccination.
Collapse
Affiliation(s)
- Yushuf Sharker
- Division of Biometrics, Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, United States of America
| | - Eben Kenah
- Biostatistics Division, College of Public Health, The Ohio State University, Columbus, Ohio, United States of America
- * E-mail:
| |
Collapse
|
8
|
Abstract
Defining and identifying causal intervention effects for transmissible infectious disease outcomes is challenging because a treatment - such as a vaccine - given to one individual may affect the infection outcomes of others. Epidemiologists have proposed causal estimands to quantify effects of interventions under contagion using a two-person partnership model. These simple conceptual models have helped researchers develop causal estimands relevant to clinical evaluation of vaccine effects. However, many of these partnership models are formulated under structural assumptions that preclude realistic infectious disease transmission dynamics, limiting their conceptual usefulness in defining and identifying causal treatment effects in empirical intervention trials. In this paper, we propose causal intervention effects in two-person partnerships under arbitrary infectious disease transmission dynamics, and give nonparametric identification results showing how effects can be estimated in empirical trials using time-to-infection or binary outcome data. The key insight is that contagion is a causal phenomenon that induces conditional independencies on infection outcomes that can be exploited for the identification of clinically meaningful causal estimands. These new estimands are compared to existing quantities, and results are illustrated using a realistic simulation of an HIV vaccine trial.
Collapse
Affiliation(s)
- Xiaoxuan Cai
- Department of Biostatistics, Yale School of Public Health
| | - Wen Wei Loh
- Department of Data Analysis, University of Ghent
| | - Forrest W Crawford
- Department of Biostatistics, Yale School of Public Health
- Department of Statistics & Data Science, Yale University
- Department of Ecology and Evolutionary Biology, Yale University
- Yale School of Management
| |
Collapse
|
9
|
Mahsin MD, Deardon R, Brown P. Geographically dependent individual-level models for infectious diseases transmission. Biostatistics 2020; 23:1-17. [PMID: 32118253 DOI: 10.1093/biostatistics/kxaa009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 11/22/2019] [Accepted: 01/29/2020] [Indexed: 11/14/2022] Open
Abstract
Infectious disease models can be of great use for understanding the underlying mechanisms that influence the spread of diseases and predicting future disease progression. Modeling has been increasingly used to evaluate the potential impact of different control measures and to guide public health policy decisions. In recent years, there has been rapid progress in developing spatio-temporal modeling of infectious diseases and an example of such recent developments is the discrete-time individual-level models (ILMs). These models are well developed and provide a common framework for modeling many disease systems; however, they assume the probability of disease transmission between two individuals depends only on their spatial separation and not on their spatial locations. In cases where spatial location itself is important for understanding the spread of emerging infectious diseases and identifying their causes, it would be beneficial to incorporate the effect of spatial location in the model. In this study, we thus generalize the ILMs to a new class of geographically dependent ILMs, to allow for the evaluation of the effect of spatially varying risk factors (e.g., education, social deprivation, environmental), as well as unobserved spatial structure, upon the transmission of infectious disease. Specifically, we consider a conditional autoregressive (CAR) model to capture the effects of unobserved spatially structured latent covariates or measurement error. This results in flexible infectious disease models that can be used for formulating etiological hypotheses and identifying geographical regions of unusually high risk to formulate preventive action. The reliability of these models is investigated on a combination of simulated epidemic data and Alberta seasonal influenza outbreak data ($2009$). This new class of models is fitted to data within a Bayesian statistical framework using Markov chain Monte Carlo methods.
Collapse
Affiliation(s)
- M D Mahsin
- Department of Mathematics and Statistics and Faculty of Veterinary Medicine, University of Calgary, 2500 University Dr NW, Calgary AB T2N 1N4, Canada
| | - Rob Deardon
- Department of Mathematics and Statistics and Faculty of Veterinary Medicine, University of Calgary, 2500 University Dr NW, Calgary AB T2N 1N4, Canada
| | - Patrick Brown
- Department of Statistical Sciences, University of Toronto, Canada
| |
Collapse
|
10
|
Abstract
Epidemiologists commonly use the risk ratio to summarize the relationship between a binary covariate and outcome, even when outcomes may be dependent. Investigations of transmissible diseases in clusters-households, villages or small groups-often report risk ratios. Epidemiologists have warned that risk ratios may be misleading when outcomes are contagious, but the nature of this error is poorly understood. In this study, we assess the meaning of the risk ratio when outcomes are contagious. We provide a mathematical definition of infectious disease transmission within clusters, based on the canonical stochastic susceptible-infective model. From this characterization, we define the individual-level ratio of instantaneous infection risks as the inferential target, and evaluate the properties of the risk ratio as an approximation of this quantity. We exhibit analytically and by simulation the circumstances under which the risk ratio implies an effect whose direction is opposite that of the true effect of the covariate. In particular, the risk ratio can be greater than one even when the covariate reduces both individual-level susceptibility to infection, and transmissibility once infected. We explain these findings in the epidemiologic language of confounding and Simpson's paradox, underscoring the pitfalls of failing to account for transmission when outcomes are contagious.
Collapse
Affiliation(s)
- Olga Morozova
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, 60 College Street, New Haven, CT 06510, USA
| | - Ted Cohen
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health, 60 College Street, New Haven, CT 06510, USA
| | - Forrest W Crawford
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06510, USA
- Department of Ecology and Evolutionary Biology, Yale University, 165 Prospect St, New Haven, CT 06511, USA
- Yale School of Management, 165 Whitney Ave, New Haven, CT 06511, USA
| |
Collapse
|
11
|
Klinkenberg D, Backer JA, Didelot X, Colijn C, Wallinga J. Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks. PLoS Comput Biol 2017; 13:e1005495. [PMID: 28545083 PMCID: PMC5436636 DOI: 10.1371/journal.pcbi.1005495] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 04/03/2017] [Indexed: 01/22/2023] Open
Abstract
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees. It is becoming easier and cheaper to obtain (whole genome) sequences of pathogen samples during outbreaks of infectious diseases. If all hosts during an outbreak are sampled, and these samples are sequenced, the small differences between the sequences (single nucleotide polymorphisms, SNPs) give information on the transmission tree, i.e. who infected whom, and when. However, correctly inferring this tree is not straightforward, because SNPs arise from unobserved processes including infection events, as well as pathogen growth and mutation within the hosts. Several methods have been developed in recent years, but often for specific applications or with limiting assumptions, so that they are not easily applied to new settings and datasets. We have developed a new model and method to infer transmission trees without putting prior limiting constraints on the order of unobserved events. The method is easily accessible in an R package implementation. We show that the method performs well on new and previously published simulated data. We illustrate applicability to a wide range of infectious diseases and settings by analysing five published datasets on densely sampled infectious disease outbreaks, confirming or improving the original results.
Collapse
Affiliation(s)
- Don Klinkenberg
- Department of Epidemiology and Surveillance, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
- * E-mail:
| | - Jantien A. Backer
- Department of Epidemiology and Surveillance, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
| | - Xavier Didelot
- Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Caroline Colijn
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - Jacco Wallinga
- Department of Epidemiology and Surveillance, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
- Department of Medical Statistics and Bio-Informatics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
12
|
Kenah E, Britton T, Halloran ME, Longini IM. Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees. PLoS Comput Biol 2016; 12:e1004869. [PMID: 27070316 PMCID: PMC4829193 DOI: 10.1371/journal.pcbi.1004869] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 03/15/2016] [Indexed: 12/20/2022] Open
Abstract
Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology. Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. Accurate estimates of transmission parameters can help identify risk factors for transmission and aid the design and evaluation of public health interventions for emerging infections. Using statistical methods for time-to-event data (survival analysis), estimation of transmission parameters is based on sums or averages over the possible transmission trees. By providing partial information about who infected whom, a pathogen phylogeny can reduce the set of possible transmission trees and increase the precision of transmission parameter estimates. We derive algorithms that enumerate the transmission trees consistent with a pathogen phylogeny and epidemiologic data, show how to calculate likelihoods for transmission data with a phylogeny, and apply these methods to a foot and mouth disease outbreak in the United Kingdom in 2001. These methods will allow pathogen genetic sequences to be incorporated into the analysis of outbreak investigations, vaccine trials, and other studies of infectious disease transmission.
Collapse
Affiliation(s)
- Eben Kenah
- Biostatistics Department and Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America
- Center for Inference and Dynamics of Infectious Diseases, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- * E-mail:
| | - Tom Britton
- Department of Mathematics, Stockholm University, Stockholm, Sweden
| | - M. Elizabeth Halloran
- Center for Inference and Dynamics of Infectious Diseases, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Ira M. Longini
- Biostatistics Department and Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America
- Center for Inference and Dynamics of Infectious Diseases, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| |
Collapse
|
13
|
Sugimoto JD, Koepke AA, Kenah EE, Halloran ME, Chowdhury F, Khan AI, LaRocque RC, Yang Y, Ryan ET, Qadri F, Calderwood SB, Harris JB, Longini IM Jr. Household Transmission of Vibrio cholerae in Bangladesh. PLoS Negl Trop Dis 2014; 8:e3314. [PMID: 25411971 DOI: 10.1371/journal.pntd.0003314] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 10/03/2014] [Indexed: 11/19/2022] Open
Abstract
Background Vibrio cholerae infections cluster in households. This study's objective was to quantify the relative contribution of direct, within-household exposure (for example, via contamination of household food, water, or surfaces) to endemic cholera transmission. Quantifying the relative contribution of direct exposure is important for planning effective prevention and control measures. Methodology/Principal Findings Symptom histories and multiple blood and fecal specimens were prospectively collected from household members of hospital-ascertained cholera cases in Bangladesh from 2001–2006. We estimated the probabilities of cholera transmission through 1) direct exposure within the household and 2) contact with community-based sources of infection. The natural history of cholera infection and covariate effects on transmission were considered. Significant direct transmission (p-value<0.0001) occurred among 1414 members of 364 households. Fecal shedding of O1 El Tor Ogawa was associated with a 4.9% (95% confidence interval: 0.9%–22.8%) risk of infection among household contacts through direct exposure during an 11-day infectious period (mean length). The estimated 11-day risk of O1 El Tor Ogawa infection through exposure to community-based sources was 2.5% (0.8%–8.0%). The corresponding estimated risks for O1 El Tor Inaba and O139 infection were 3.7% (0.7%–16.6%) and 8.2% (2.1%–27.1%) through direct exposure, and 3.4% (1.7%–6.7%) and 2.0% (0.5%–7.3%) through community-based exposure. Children under 5 years-old were at elevated risk of infection. Limitations of the study may have led to an underestimation of the true risk of cholera infection. For instance, available covariate data may have incompletely characterized levels of pre-existing immunity to cholera infection. Transmission via direct exposure occurring outside of the household was not considered. Conclusions Direct exposure contributes substantially to endemic transmission of symptomatic cholera in an urban setting. We provide the first estimate of the transmissibility of endemic cholera within prospectively-followed members of households. The role of direct transmission must be considered when planning cholera control activities. Since John Snow's ground-breaking investigations of the devastating outbreaks in 19th-century London, cholera has been considered the quintessential waterborne human infection, transmitting via fecal contamination of environmental water sources. Recently, renewed interest has been paid to the potential importance of transmission through direct exposure within close-contact groups, such as, via fecal contamination of surfaces, food, or drinking water within households. Significant direct transmission of cholera within close contact groups would represent a new target for innovative prevention and control strategies. We estimated the probability of transmission 1) via direct contact within 364 urban households located in an endemic cholera setting (Dhaka, Bangladesh) and 2) via exposure to sources located outside of these households. In this setting we estimated a 4 to 8 percent probability of becoming infected with cholera via direct exposure within households in this setting versus a 2 to 3 percent likelihood of infection due to exposure to external sources over a comparable time period. Our results demonstrate that direct (within-household) transmission is a significant component of endemic cholera transmission, suggesting that biomedical and behavioral-modification interventions specifically targeting this mode of transmission could substantially reduce the cholera burden in this type of setting.
Collapse
|