1
|
Carson J, Keeling M, Wyllie D, Ribeca P, Didelot X. Inference of Infectious Disease Transmission through a Relaxed Bottleneck Using Multiple Genomes Per Host. Mol Biol Evol 2024; 41:msad288. [PMID: 38168711 PMCID: PMC10798190 DOI: 10.1093/molbev/msad288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/21/2023] [Accepted: 12/29/2023] [Indexed: 01/05/2024] Open
Abstract
In recent times, pathogen genome sequencing has become increasingly used to investigate infectious disease outbreaks. When genomic data is sampled densely enough amongst infected individuals, it can help resolve who infected whom. However, transmission analysis cannot rely solely on a phylogeny of the genomes but must account for the within-host evolution of the pathogen, which blurs the relationship between phylogenetic and transmission trees. When only a single genome is sampled for each host, the uncertainty about who infected whom can be quite high. Consequently, transmission analysis based on multiple genomes of the same pathogen per host has a clear potential for delivering more precise results, even though it is more laborious to achieve. Here, we present a new methodology that can use any number of genomes sampled from a set of individuals to reconstruct their transmission network. Furthermore, we remove the need for the assumption of a complete transmission bottleneck. We use simulated data to show that our method becomes more accurate as more genomes per host are provided, and that it can infer key infectious disease parameters such as the size of the transmission bottleneck, within-host growth rate, basic reproduction number, and sampling fraction. We demonstrate the usefulness of our method in applications to real datasets from an outbreak of Pseudomonas aeruginosa amongst cystic fibrosis patients and a nosocomial outbreak of Klebsiella pneumoniae.
Collapse
Affiliation(s)
- Jake Carson
- Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry CV4 7AL, UK
| | - Matt Keeling
- Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry CV4 7AL, UK
| | | | | | - Xavier Didelot
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry CV4 7AL, UK
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
2
|
Warren JL, Chitwood MH, Sobkowiak B, Colijn C, Cohen T. Spatial modeling of Mycobacterium tuberculosis transmission with dyadic genetic relatedness data. Biometrics 2023; 79:3650-3663. [PMID: 36745619 PMCID: PMC10404301 DOI: 10.1111/biom.13836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 01/31/2023] [Indexed: 02/07/2023]
Abstract
Understanding factors that contribute to the increased likelihood of pathogen transmission between two individuals is important for infection control. However, analyzing measures of pathogen relatedness to estimate these associations is complicated due to correlation arising from the presence of the same individual across multiple dyadic outcomes, potential spatial correlation caused by unmeasured transmission dynamics, and the distinctive distributional characteristics of some of the outcomes. We develop two novel hierarchical Bayesian spatial methods for analyzing dyadic pathogen genetic relatedness data, in the form of patristic distances and transmission probabilities, that simultaneously address each of these complications. Using individual-level spatially correlated random effect parameters, we account for multiple sources of correlation between the outcomes as well as other important features of their distribution. Through simulation, we show the limitations of existing approaches in terms of estimating key associations of interest, and the ability of the new methodology to correct for these issues across datasets with different levels of correlation. All methods are applied to Mycobacterium tuberculosis data from the Republic of Moldova, where we identify previously unknown factors associated with disease transmission and, through analysis of the random effect parameters, key individuals, and areas with increased transmission activity. Model comparisons show the importance of the new methodology in this setting. The methods are implemented in the R package GenePair.
Collapse
Affiliation(s)
| | - Melanie H. Chitwood
- Department of Epidemiology of Microbial Diseases, Yale University, Connecticut, USA
| | | | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| | - Ted Cohen
- Department of Epidemiology of Microbial Diseases, Yale University, Connecticut, USA
| |
Collapse
|
3
|
Susvitasari K, Tupper P, Stockdale JE, Colijn C. A method to estimate the serial interval distribution under partially-sampled data. Epidemics 2023; 45:100733. [PMID: 38056165 DOI: 10.1016/j.epidem.2023.100733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 11/22/2023] [Accepted: 11/26/2023] [Indexed: 12/08/2023] Open
Abstract
The serial interval of an infectious disease is an important variable in epidemiology. It is defined as the period of time between the symptom onset times of the infector and infectee in a direct transmission pair. Under partially sampled data, purported infector-infectee pairs may actually be separated by one or more unsampled cases in between. Misunderstanding such pairs as direct transmissions will result in overestimating the length of serial intervals. On the other hand, two cases that are infected by an unseen third case (known as coprimary transmission) may be classified as a direct transmission pair, leading to an underestimation of the serial interval. Here, we introduce a method to jointly estimate the distribution of serial intervals factoring in these two sources of error. We simultaneously estimate the distribution of the number of unsampled intermediate cases between purported infector-infectee pairs, as well as the fraction of such pairs that are coprimary. We also extend our method to situations where each infectee has multiple possible infectors, and show how to factor this additional source of uncertainty into our estimates. We assess our method's performance on simulated data sets and find that our method provides consistent and robust estimates. We also apply our method to data from real-life outbreaks of four infectious diseases and compare our results with published results. With similar accuracy, our method of estimating serial interval distribution provides unique advantages, allowing its application in settings of low sampling rates and large population sizes, such as widespread community transmission tracked by routine public health surveillance.
Collapse
Affiliation(s)
| | - Paul Tupper
- Department of Mathematics, Simon Fraser University, Canada
| | | | | |
Collapse
|
4
|
Hayati M, Sobkowiak B, Stockdale JE, Colijn C. Phylogenetic identification of influenza virus candidates for seasonal vaccines. SCIENCE ADVANCES 2023; 9:eabp9185. [PMID: 37922357 PMCID: PMC10624341 DOI: 10.1126/sciadv.abp9185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 10/05/2023] [Indexed: 11/05/2023]
Abstract
The seasonal influenza (flu) vaccine is designed to protect against those influenza viruses predicted to circulate during the upcoming flu season, but identifying which viruses are likely to circulate is challenging. We use features from phylogenetic trees reconstructed from hemagglutinin (HA) and neuraminidase (NA) sequences, together with a support vector machine, to predict future circulation. We obtain accuracies of 0.75 to 0.89 (AUC 0.83 to 0.91) over 2016-2020. We explore ways to select potential candidates for a seasonal vaccine and find that the machine learning model has a moderate ability to select strains that are close to future populations. However, consensus sequences among the most recent 3 years also do well at this task. We identify similar candidate strains to those proposed by the World Health Organization, suggesting that this approach can help inform vaccine strain selection.
Collapse
Affiliation(s)
- Maryam Hayati
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Benjamin Sobkowiak
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | | | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
5
|
Van der Roest BR, Bootsma MCJ, Fischer EAJ, Klinkenberg D, Kretzschmar MEE. A Bayesian inference method to estimate transmission trees with multiple introductions; applied to SARS-CoV-2 in Dutch mink farms. PLoS Comput Biol 2023; 19:e1010928. [PMID: 38011266 PMCID: PMC10703282 DOI: 10.1371/journal.pcbi.1010928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 12/07/2023] [Accepted: 11/12/2023] [Indexed: 11/29/2023] Open
Abstract
Knowledge of who infected whom during an outbreak of an infectious disease is important to determine risk factors for transmission and to design effective control measures. Both whole-genome sequencing of pathogens and epidemiological data provide useful information about the transmission events and underlying processes. Existing models to infer transmission trees usually assume that the pathogen is introduced only once from outside into the population of interest. However, this is not always true. For instance, SARS-CoV-2 is suggested to be introduced multiple times in mink farms in the Netherlands from the SARS-CoV-2 pandemic among humans. Here, we developed a Bayesian inference method combining whole-genome sequencing data and epidemiological data, allowing for multiple introductions of the pathogen in the population. Our method does not a priori split the outbreak into multiple phylogenetic clusters, nor does it break the dependency between the processes of mutation, within-host dynamics, transmission, and observation. We implemented our method as an additional feature in the R-package phybreak. On simulated data, our method correctly identifies the number of introductions, with an accuracy depending on the proportion of all observed cases that are introductions. Moreover, when a single introduction was simulated, our method produced similar estimates of parameters and transmission trees as the existing package. When applied to data from a SARS-CoV-2 outbreak in Dutch mink farms, the method provides strong evidence for independent introductions of the pathogen at 13 farms, infecting a total of 63 farms. Using the new feature of the phybreak package, transmission routes of a more complex class of infectious disease outbreaks can be inferred which will aid infection control in future outbreaks.
Collapse
Affiliation(s)
- Bastiaan R. Van der Roest
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Martin C. J. Bootsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Department of Mathematics, Faculty of Science, Utrecht University, Utrecht, Netherlands
| | - Egil A. J. Fischer
- Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands
| | - Don Klinkenberg
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Mirjam E. E. Kretzschmar
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| |
Collapse
|
6
|
Specht IOA, Petros BA, Moreno GK, Brock-Fisher T, Krasilnikova LA, Schifferli M, Yang K, Cronan P, Glennon O, Schaffner SF, Park DJ, MacInnis BL, Ozonoff A, Fry B, Mitzenmacher MD, Varilly P, Sabeti PC. Inferring Viral Transmission Pathways from Within-Host Variation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.14.23297039. [PMID: 37873325 PMCID: PMC10593003 DOI: 10.1101/2023.10.14.23297039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Genome sequencing can offer critical insight into pathogen spread in viral outbreaks, but existing transmission inference methods use simplistic evolutionary models and only incorporate a portion of available genetic data. Here, we develop a robust evolutionary model for transmission reconstruction that tracks the genetic composition of within-host viral populations over time and the lineages transmitted between hosts. We confirm that our model reliably describes within-host variant frequencies in a dataset of 134,682 SARS-CoV-2 deep-sequenced genomes from Massachusetts, USA. We then demonstrate that our reconstruction approach infers transmissions more accurately than two leading methods on synthetic data, as well as in a controlled outbreak of bovine respiratory syncytial virus and an epidemiologically-investigated SARS-CoV-2 outbreak in South Africa. Finally, we apply our transmission reconstruction tool to 5,692 outbreaks among the 134,682 Massachusetts genomes. Our methods and results demonstrate the utility of within-host variation for transmission inference of SARS-CoV-2 and other pathogens, and provide an adaptable mathematical framework for tracking within-host evolution.
Collapse
Affiliation(s)
- Ivan O. A. Specht
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard College, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Brittany A. Petros
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA 02139, USA
- Harvard/MIT MD-PhD Program, Boston, MA 02115, USA
- Systems, Synthetic, and Quantitative Biology PhD Program, Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Gage K. Moreno
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Taylor Brock-Fisher
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Organismic and Evolutionary Biology, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Lydia A. Krasilnikova
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | | | | | - Paul Cronan
- Fathom Information Design, Boston, MA 02114, USA
| | | | | | - Daniel J. Park
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bronwyn L. MacInnis
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
- Massachusetts Consortium on Pathogen Readiness, Harvard Medical School, Harvard University, Boston, MA 02115, USA
| | - Al Ozonoff
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ben Fry
- Fathom Information Design, Boston, MA 02114, USA
| | - Michael D. Mitzenmacher
- Department of Computer Science, School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Patrick Varilly
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Pardis C. Sabeti
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Organismic and Evolutionary Biology, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
- Massachusetts Consortium on Pathogen Readiness, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
7
|
Li M, Lu L, Jiang Q, Jiang Y, Yang C, Li J, Zhang Y, Zou J, Li Y, Dai W, Hong J, Takiff H, Shen X, Guo X, Yuan Z, Gao Q. Genotypic and spatial analysis of transmission dynamics of tuberculosis in Shanghai, China: a 10-year prospective population-based surveillance study. THE LANCET REGIONAL HEALTH. WESTERN PACIFIC 2023; 38:100833. [PMID: 37790084 PMCID: PMC10544272 DOI: 10.1016/j.lanwpc.2023.100833] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 06/02/2023] [Accepted: 06/15/2023] [Indexed: 10/05/2023]
Abstract
Background With improved tuberculosis (TB) control programs, the incidence of TB in China declined dramatically over the past few decades, but recently the rate of decrease has slowed, especially in large cities such as Shanghai. To help formulate strategies to further reduce TB incidence, we performed a 10-year study in Songjiang, a district of Shanghai, to delineate the characteristics, transmission patterns, and dynamic changes of the local TB burden. Methods We conducted a population-based study of culture-positive pulmonary TB patients diagnosed in Songjiang during 2011-2020. Genomic clusters were defined with a threshold distance of 12-single-nucleotide-polymorphisms based on whole-genome sequencing, and risk factors for clustering were identified by logistic regression. Transmission inference was performed using phybreak. The distances between the residences of patients were compared to the genomic distances of their isolates. Spatial patient hotspots were defined with kernel density estimation. Findings Of 2212 enrolled patients, 74.7% (1652/2212) were internal migrants. The clustering rate (25.2%, 558/2212) and spatial concentrations of clustered and unclustered patients were unchanged over the study period. Migrants had significantly higher TB rates but less clustering than residents. Clustering was highest in male migrants, younger patients and both residents and migrants employed in physical labor. Only 22.1% of transmission events occurred between residents and migrants, with residents more likely to transmit to migrants. The clustering risk decreased rapidly with increasing distances between patient residences, but more than half of clustered patient pairs lived ≥5 km apart. Epidemiologic links were identified for only 15.6% of clustered patients, mostly in close contacts. Interpretation Although some of the TB in Songjiang's migrant population is caused by strains brought by infected migrants, local, recent transmission is an important driver of the TB burden. These results suggest that further reductions in TB incidence require novel strategies to detect TB early and interrupt urban transmission. Funding Shanghai Municipal Science and Technology Major Project (ZD2021CY001), National Natural Science Foundation of China (82272376), National Research Council of Science and Technology Major Project of China (2017ZX10201302-006).
Collapse
Affiliation(s)
- Meng Li
- Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), School of Basic Medical Science, Shanghai Medical College, Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China
| | - Liping Lu
- Department of Tuberculosis Control, Songjiang District Center for Disease Control and Prevention, Shanghai, China
| | - Qi Jiang
- Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), School of Basic Medical Science, Shanghai Medical College, Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China
- School of Public Health, Renmin Hospital Public Health Research Institute, Wuhan University, Wuhan, China
| | - Yuan Jiang
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
- Shanghai Institute of Preventive Medicine, Shanghai, China
| | - Chongguang Yang
- Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), School of Basic Medical Science, Shanghai Medical College, Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China
- School of Public Health (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
| | - Jing Li
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
- Shanghai Institute of Preventive Medicine, Shanghai, China
| | - Yangyi Zhang
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
- Shanghai Institute of Preventive Medicine, Shanghai, China
| | - Jinyan Zou
- Department of Tuberculosis Control, Songjiang District Center for Disease Control and Prevention, Shanghai, China
| | - Yong Li
- Department of Tuberculosis Control, Songjiang District Center for Disease Control and Prevention, Shanghai, China
| | - Wenqi Dai
- Department of Clinical Laboratory, Songjiang District Central Hospital, Shanghai, China
| | - Jianjun Hong
- Department of Tuberculosis Control, Songjiang District Center for Disease Control and Prevention, Shanghai, China
| | - Howard Takiff
- Laboratorio de Genética Molecular, CMBC, Instituto Venezolano de Investigaciones Científicas, IVIC, Caracas, Venezuela
| | - Xin Shen
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
- Shanghai Institute of Preventive Medicine, Shanghai, China
| | - Xiaoqin Guo
- Department of Tuberculosis Control, Songjiang District Center for Disease Control and Prevention, Shanghai, China
| | - Zhengan Yuan
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
- Shanghai Institute of Preventive Medicine, Shanghai, China
| | - Qian Gao
- Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), School of Basic Medical Science, Shanghai Medical College, Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China
| |
Collapse
|
8
|
Tang M, Dudas G, Bedford T, Minin VN. Fitting stochastic epidemic models to gene genealogies using linear noise approximation. Ann Appl Stat 2023. [DOI: 10.1214/21-aoas1583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Mingwei Tang
- Department of Statistics, University of Washington, Seattle
| | - Gytis Dudas
- Gothenburg Global Biodiversity Centre (GGBC)
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center
| | | |
Collapse
|
9
|
Didelot X, Helekal D, Kendall M, Ribeca P. Distinguishing imported cases from locally acquired cases within a geographically limited genomic sample of an infectious disease. Bioinformatics 2022; 39:6849542. [PMID: 36440957 PMCID: PMC9805578 DOI: 10.1093/bioinformatics/btac761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/17/2022] [Accepted: 11/24/2022] [Indexed: 11/30/2022] Open
Abstract
MOTIVATION The ability to distinguish imported cases from locally acquired cases has important consequences for the selection of public health control strategies. Genomic data can be useful for this, for example, using a phylogeographic analysis in which genomic data from multiple locations are compared to determine likely migration events between locations. However, these methods typically require good samples of genomes from all locations, which is rarely available. RESULTS Here, we propose an alternative approach that only uses genomic data from a location of interest. By comparing each new case with previous cases from the same location, we are able to detect imported cases, as they have a different genealogical distribution than that of locally acquired cases. We show that, when variations in the size of the local population are accounted for, our method has good sensitivity and excellent specificity for the detection of imports. We applied our method to data simulated under the structured coalescent model and demonstrate relatively good performance even when the local population has the same size as the external population. Finally, we applied our method to several recent genomic datasets from both bacterial and viral pathogens, and show that it can, in a matter of seconds or minutes, deliver important insights on the number of imports to a geographically limited sample of a pathogen population. AVAILABILITY AND IMPLEMENTATION The R package DetectImports is freely available from https://github.com/xavierdidelot/DetectImports. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - David Helekal
- Centre for Doctoral Training in Mathematics for Real-World Systems, University of Warwick, Coventry CV4 7AL, UK
| | - Michelle Kendall
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Paolo Ribeca
- Gastrointestinal Bacteria Reference Unit, UK Health Security Agency, London NW9 5EQ, UK,Biomathematics and Statistics Scotland, The James Hutton Institute, Edinburgh EH9 3FD, UK
| |
Collapse
|
10
|
Gut to lung translocation and antibiotic mediated selection shape the dynamics of Pseudomonas aeruginosa in an ICU patient. Nat Commun 2022; 13:6523. [PMID: 36414617 PMCID: PMC9681761 DOI: 10.1038/s41467-022-34101-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 10/13/2022] [Indexed: 11/23/2022] Open
Abstract
Bacteria have the potential to translocate between sites in the human body, but the dynamics and consequences of within-host bacterial migration remain poorly understood. Here we investigate the link between gut and lung Pseudomonas aeruginosa populations in an intensively sampled ICU patient using a combination of genomics, isolate phenotyping, host immunity profiling, and clinical data. Crucially, we show that lung colonization in the ICU was driven by the translocation of P. aeruginosa from the gut. Meropenem treatment for a suspected urinary tract infection selected for elevated resistance in both the gut and lung. However, resistance was driven by parallel evolution in the gut and lung coupled with organ specific selective pressures, and translocation had only a minor impact on AMR. These findings suggest that reducing intestinal colonization of Pseudomonas may be an effective way to prevent lung infections in critically ill patients.
Collapse
|
11
|
Chao E, Chato C, Vender R, Olabode AS, Ferreira RC, Poon AFY. Molecular source attribution. PLoS Comput Biol 2022; 18:e1010649. [PMID: 36395093 PMCID: PMC9671344 DOI: 10.1371/journal.pcbi.1010649] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Elisa Chao
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Connor Chato
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Reid Vender
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- School of Medicine, Queen’s University, Kingston, Ontario, Canada
| | - Abayomi S. Olabode
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Roux-Cil Ferreira
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Art F. Y. Poon
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- * E-mail:
| |
Collapse
|
12
|
Yin J, Zhang H, Gao Z, Jiang H, Qin L, Zhu C, Gao Q, He X, Li W. Transmission of multidrug-resistant tuberculosis in Beijing, China: An epidemiological and genomic analysis. Front Public Health 2022; 10:1019198. [PMID: 36408017 PMCID: PMC9672842 DOI: 10.3389/fpubh.2022.1019198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
Background Understanding multidrug-resistant tuberculosis (MDR-TB) transmission patterns is crucial for controlling the disease. We aimed to identify high-risk populations and geographic settings of MDR-TB transmission. Methods We conducted a population-based retrospective study of MDR-TB patients in Beijing from 2018 to 2020, and assessed MDR-TB recent transmission using whole-genome sequencing of isolates. Geospatial analysis was conducted with kernel density estimation. We combined TransPhylo software with epidemiological investigation data to construct transmission networks. Logistic regression analysis was utilized to identify risk factors for recent transmission. Results We included 241 MDR-TB patients, of which 146 (60.58%) were available for genomic analysis. Drug resistance prediction showed that resistance to fluoroquinolones (FQs) was as high as 39.74% among new cases. 36 (24.66%) of the 146 MDR strains were grouped into 12 genome clusters, suggesting recent transmission of MDR strains. 44.82% (13/29) of the clustered patients lived in the same residential community, adjacent residential community or the same street as other cases. The inferred transmission chain found a total of 6 transmission events in 3 clusters; of these, 4 transmission events occurred in residential areas and nearby public places. Logistic regression analysis revealed that being aged 25-34 years-old was a risk factor for recent transmission. Conclusions The recent transmission of MDR-TB in Beijing is severe, and residential areas are common sites of transmission; high levels of FQs drug resistance suggest that FQs should be used with caution unless resistance can be ruled out by laboratory testing.
Collapse
Affiliation(s)
- Jinfeng Yin
- Beijing Chest Hospital, Capital Medical University, Beijing, China,National Tuberculosis Clinical Laboratory, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, China
| | - Hongwei Zhang
- Tuberculosis Prevention and Control Institute, Beijing Center for Disease Prevention and Control, Beijing, China
| | - Zhidong Gao
- Tuberculosis Prevention and Control Institute, Beijing Center for Disease Prevention and Control, Beijing, China
| | - Hui Jiang
- Beijing Chest Hospital, Capital Medical University, Beijing, China,National Tuberculosis Clinical Laboratory, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, China
| | - Liyi Qin
- Beijing Chest Hospital, Capital Medical University, Beijing, China,National Tuberculosis Clinical Laboratory, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, China
| | - Chendi Zhu
- Beijing Chest Hospital, Capital Medical University, Beijing, China,National Tuberculosis Clinical Laboratory, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, China
| | - Qian Gao
- Key Laboratory of Medical Molecular Virology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Xiaoxin He
- Tuberculosis Prevention and Control Institute, Beijing Center for Disease Prevention and Control, Beijing, China
| | - Weimin Li
- Beijing Chest Hospital, Capital Medical University, Beijing, China,National Tuberculosis Clinical Laboratory, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, China,*Correspondence: Weimin Li
| |
Collapse
|
13
|
Skums P, Mohebbi F, Tsyvina V, Baykal PI, Nemira A, Ramachandran S, Khudyakov Y. SOPHIE: Viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework. Cell Syst 2022; 13:844-856.e4. [PMID: 36265470 PMCID: PMC9590096 DOI: 10.1016/j.cels.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/05/2022] [Accepted: 07/19/2022] [Indexed: 01/26/2023]
Abstract
Genomic epidemiology is now widely used for viral outbreak investigations. Still, this methodology faces many challenges. First, few methods account for intra-host viral diversity. Second, maximum parsimony principle continues to be employed for phylogenetic inference of transmission histories, even though maximum likelihood or Bayesian models are usually more consistent. Third, many methods utilize case-specific data, such as sampling times or infection exposure intervals. This impedes study of persistent infections in vulnerable groups, where such information has a limited use. Finally, most methods implicitly assume that transmission events are independent, although common source outbreaks violate this assumption. We propose a maximum likelihood framework, SOPHIE, based on the integration of phylogenetic and random graph models. It infers transmission networks from viral phylogenies and expected properties of inter-host social networks modeled as random graphs with given expected degree distributions. SOPHIE is scalable, accounts for intra-host diversity, and accurately infers transmissions without case-specific epidemiological data.
Collapse
Affiliation(s)
- Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, USA.
| | - Fatemeh Mohebbi
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Vyacheslav Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Pelin Icer Baykal
- Department of Biosystems Science & Engineering, ETH Zurich, Basel, Switzerland
| | - Alina Nemira
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Sumathi Ramachandran
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Yury Khudyakov
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| |
Collapse
|
14
|
Lundgren E, Romero-Severson E, Albert J, Leitner T. Combining biomarker and virus phylogenetic models improves HIV-1 epidemiological source identification. PLoS Comput Biol 2022; 18:e1009741. [PMID: 36026480 PMCID: PMC9455879 DOI: 10.1371/journal.pcbi.1009741] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 09/08/2022] [Accepted: 08/02/2022] [Indexed: 01/07/2023] Open
Abstract
To identify and stop active HIV transmission chains new epidemiological techniques are needed. Here, we describe the development of a multi-biomarker augmentation to phylogenetic inference of the underlying transmission history in a local population. HIV biomarkers are measurable biological quantities that have some relationship to the amount of time someone has been infected with HIV. To train our model, we used five biomarkers based on real data from serological assays, HIV sequence data, and target cell counts in longitudinally followed, untreated patients with known infection times. The biomarkers were modeled with a mixed effects framework to allow for patient specific variation and general trends, and fit to patient data using Markov Chain Monte Carlo (MCMC) methods. Subsequently, the density of the unobserved infection time conditional on observed biomarkers were obtained by integrating out the random effects from the model fit. This probabilistic information about infection times was incorporated into the likelihood function for the transmission history and phylogenetic tree reconstruction, informed by the HIV sequence data. To critically test our methodology, we developed a coalescent-based simulation framework that generates phylogenies and biomarkers given a specific or general transmission history. Testing on many epidemiological scenarios showed that biomarker augmented phylogenetics can reach 90% accuracy under idealized situations. Under realistic within-host HIV-1 evolution, involving substantial within-host diversification and frequent transmission of multiple lineages, the average accuracy was at about 50% in transmission clusters involving 5-50 hosts. Realistic biomarker data added on average 16 percentage points over using the phylogeny alone. Using more biomarkers improved the performance. Shorter temporal spacing between transmission events and increased transmission heterogeneity reduced reconstruction accuracy, but larger clusters were not harder to get right. More sequence data per infected host also improved accuracy. We show that the method is robust to incomplete sampling and that adding biomarkers improves reconstructions of real HIV-1 transmission histories. The technology presented here could allow for better prevention programs by providing data for locally informed and tailored strategies.
Collapse
Affiliation(s)
- Erik Lundgren
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Ethan Romero-Severson
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Jan Albert
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden
| | - Thomas Leitner
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- * E-mail:
| |
Collapse
|
15
|
Abstract
Phylogenetic models have long assumed that lineages diverge independently. Processes of diversification that are of interest in biogeography, epidemiology, and genome evolution violate this assumption by affecting multiple evolutionary lineages. To relax the assumption of independent divergences and infer patterns of divergences predicted by such processes, we introduce a way of conceptualizing, modeling, and inferring phylogenetic trees. We apply the approach to genomic data from geckos distributed across the Philippines and find support for patterns of shared divergences predicted by repeated fragmentation of the archipelago by interglacial rises in sea level. Many processes of biological diversification can simultaneously affect multiple evolutionary lineages. Examples include multiple members of a gene family diverging when a region of a chromosome is duplicated, multiple viral strains diverging at a “super-spreading” event, and a geological event fragmenting whole communities of species. It is difficult to test for patterns of shared divergences predicted by such processes because all phylogenetic methods assume that lineages diverge independently. We introduce a Bayesian phylogenetic approach to relax the assumption of independent, bifurcating divergences by expanding the space of topologies to include trees with shared and multifurcating divergences. This allows us to jointly infer phylogenetic relationships, divergence times, and patterns of divergences predicted by processes of diversification that affect multiple evolutionary lineages simultaneously or lead to more than two descendant lineages. Using simulations, we find that the method accurately infers shared and multifurcating divergence events when they occur and performs as well as current phylogenetic methods when divergences are independent and bifurcating. We apply our approach to genomic data from two genera of geckos from across the Philippines to test if past changes to the islands’ landscape caused bursts of speciation. Unlike previous analyses restricted to only pairs of gecko populations, we find evidence for patterns of shared divergences. By generalizing the space of phylogenetic trees in a way that is independent from the likelihood model, our approach opens many avenues for future research into processes of diversification across the life sciences.
Collapse
|
16
|
Leavitt SV, Jenkins HE, Sebastiani P, Lee RS, Horsburgh CR, Tibbs AM, White LF. Estimation of the generation interval using pairwise relative transmission probabilities. Biostatistics 2022; 23:807-824. [PMID: 33527996 PMCID: PMC9291635 DOI: 10.1093/biostatistics/kxaa059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 12/07/2020] [Accepted: 12/08/2020] [Indexed: 11/13/2022] Open
Abstract
The generation interval (the time between infection of primary and secondary cases) and its often used proxy, the serial interval (the time between symptom onset of primary and secondary cases) are critical parameters in understanding infectious disease dynamics. Because it is difficult to determine who infected whom, these important outbreak characteristics are not well understood for many diseases. We present a novel method for estimating transmission intervals using surveillance or outbreak investigation data that, unlike existing methods, does not require a contact tracing data or pathogen whole genome sequence data on all cases. We start with an expectation maximization algorithm and incorporate relative transmission probabilities with noise reduction. We use simulations to show that our method can accurately estimate the generation interval distribution for diseases with different reproductive numbers, generation intervals, and mutation rates. We then apply our method to routinely collected surveillance data from Massachusetts (2010-2016) to estimate the serial interval of tuberculosis in this setting.
Collapse
Affiliation(s)
- Sarah V Leavitt
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Helen E Jenkins
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Robyn S Lee
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - C Robert Horsburgh
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Andrew M Tibbs
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Laura F White
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| |
Collapse
|
17
|
Carson J, Ledda A, Ferretti L, Keeling M, Didelot X. The bounded coalescent model: Conditioning a genealogy on a minimum root date. J Theor Biol 2022; 548:111186. [PMID: 35697144 DOI: 10.1016/j.jtbi.2022.111186] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/05/2022] [Accepted: 06/02/2022] [Indexed: 01/27/2023]
Abstract
The coalescent model represents how individuals sampled from a population may have originated from a last common ancestor. The bounded coalescent model is obtained by conditioning the coalescent model such that the last common ancestor must have existed after a certain date. This conditioned model arises in a variety of applications, such as speciation, horizontal gene transfer or transmission analysis, and yet the bounded coalescent model has not been previously analysed in detail. Here we describe a new algorithm to simulate from this model directly, without resorting to rejection sampling. We show that this direct simulation algorithm is more computationally efficient than the rejection sampling approach. We also show how to calculate the probability of the last common ancestor occurring after a given date, which is required to compute the probability density of realisations under the bounded coalescent model. Our results are applicable in both the isochronous (when all samples have the same date) and heterochronous (where samples can have different dates) settings. We explore the effect of setting a bound on the date of the last common ancestor, and show that it affects a number of properties of the resulting phylogenies. All our methods are implemented in a new R package called BoundedCoalescent which is freely available online.
Collapse
Affiliation(s)
- Jake Carson
- Mathematics Institute, University of Warwick, United Kingdom
| | - Alice Ledda
- HCAI, Fungal, AMR, AMU & Sepsis Division, UK Health Security Agency, United Kingdom
| | - Luca Ferretti
- Big Data Institute, University of Oxford, United Kingdom
| | - Matt Keeling
- Mathematics Institute, University of Warwick, United Kingdom
| | - Xavier Didelot
- Department of Statistics and School of Life Sciences, University of Warwick, United Kingdom
| |
Collapse
|
18
|
Ko KKK, Chng KR, Nagarajan N. Metagenomics-enabled microbial surveillance. Nat Microbiol 2022; 7:486-496. [PMID: 35365786 DOI: 10.1038/s41564-022-01089-w] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 02/22/2022] [Indexed: 12/13/2022]
Abstract
Lessons learnt from the COVID-19 pandemic include increased awareness of the potential for zoonoses and emerging infectious diseases that can adversely affect human health. Although emergent viruses are currently in the spotlight, we must not forget the ongoing toll of morbidity and mortality owing to antimicrobial resistance in bacterial pathogens and to vector-borne, foodborne and waterborne diseases. Population growth, planetary change, international travel and medical tourism all contribute to the increasing frequency of infectious disease outbreaks. Surveillance is therefore of crucial importance, but the diversity of microbial pathogens, coupled with resource-intensive methods, compromises our ability to scale-up such efforts. Innovative technologies that are both easy to use and able to simultaneously identify diverse microorganisms (viral, bacterial or fungal) with precision are necessary to enable informed public health decisions. Metagenomics-enabled surveillance methods offer the opportunity to improve detection of both known and yet-to-emerge pathogens.
Collapse
Affiliation(s)
- Karrie K K Ko
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore.,Department of Microbiology, Singapore General Hospital, Singapore, Singapore.,Department of Molecular Pathology, Singapore General Hospital, Singapore, Singapore.,Duke-NUS Medical School, Singapore, Singapore.,Yong Loo Lin School of Medicine, National Univerisity of Singapore, Singapore, Singapore
| | - Kern Rei Chng
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore.,National Centre for Food Science, Singapore Food Agency, Singapore, Singapore
| | - Niranjan Nagarajan
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore. .,Yong Loo Lin School of Medicine, National Univerisity of Singapore, Singapore, Singapore.
| |
Collapse
|
19
|
Methods Combining Genomic and Epidemiological Data in the Reconstruction of Transmission Trees: A Systematic Review. Pathogens 2022; 11:pathogens11020252. [PMID: 35215195 PMCID: PMC8875843 DOI: 10.3390/pathogens11020252] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/08/2022] [Accepted: 02/11/2022] [Indexed: 11/17/2022] Open
Abstract
In order to better understand transmission dynamics and appropriately target control and preventive measures, studies have aimed to identify who-infected-whom in actual outbreaks. Numerous reconstruction methods exist, each with their own assumptions, types of data, and inference strategy. Thus, selecting a method can be difficult. Following PRISMA guidelines, we systematically reviewed the literature for methods combing epidemiological and genomic data in transmission tree reconstruction. We identified 22 methods from the 41 selected articles. We defined three families according to how genomic data was handled: a non-phylogenetic family, a sequential phylogenetic family, and a simultaneous phylogenetic family. We discussed methods according to the data needed as well as the underlying sequence mutation, within-host evolution, transmission, and case observation. In the non-phylogenetic family consisting of eight methods, pairwise genetic distances were estimated. In the phylogenetic families, transmission trees were inferred from phylogenetic trees either simultaneously (nine methods) or sequentially (five methods). While a majority of methods (17/22) modeled the transmission process, few (8/22) took into account imperfect case detection. Within-host evolution was generally (7/8) modeled as a coalescent process. These practical and theoretical considerations were highlighted in order to help select the appropriate method for an outbreak.
Collapse
|
20
|
Luo T, Wang J, Wang Q, Wang X, Zhao P, Zeng DD, Zhang Q, Cao Z. Reconstruction of the Transmission Chain of COVID-19 Outbreak in Beijing's Xinfadi Market, China. Int J Infect Dis 2022; 116:411-417. [PMID: 35074519 PMCID: PMC8776627 DOI: 10.1016/j.ijid.2022.01.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 01/09/2022] [Accepted: 01/16/2022] [Indexed: 12/03/2022] Open
Abstract
Objectives The aim of the study was to reconstruct the complete transmission chain of the COVID-19 outbreak in Beijing's Xinfadi Market using data from epidemiological investigations, which contributes to reflecting transmission dynamics and transmission risk factors. Methods We set up a transmission model, and the model parameters are estimated from the survey data via Markov chain Monte Carlo sampling. Bayesian data augmentation approaches are used to account for uncertainty in the source of infection, unobserved onset, and infection dates. Results The rate of transmission of COVID-19 within households is 9.2%. Older people are more susceptible to infection. The accuracy of our reconstructed transmission chain was 67.26%. In the gathering place of this outbreak, the Beef and Mutton Trading Hall of Xinfadi market, most of the transmission occurs within 20 m, only 19.61% of the transmission occurs over a wider area (>20 m), with an overall average transmission distance of 13.00 m. The deepest transmission generation is 9. In this outbreak, there were 2 abnormally high transmission events. Conclusions The statistical method of reconstruction of transmission trees from incomplete epidemic data provides a valuable tool to help understand the complex transmission factors and provides a practical guideline for investigating the characteristics of the development of epidemics and the formulation of control measures.
Collapse
|
21
|
Dhar S, Zhang C, Măndoiu II, Bansal MS. TNet: Transmission Network Inference Using Within-Host Strain Diversity and its Application to Geographical Tracking of COVID-19 Spread. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:230-242. [PMID: 34255632 PMCID: PMC8956368 DOI: 10.1109/tcbb.2021.3096455] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 07/03/2021] [Accepted: 07/08/2021] [Indexed: 06/13/2023]
Abstract
The inference of disease transmission networks is an important problem in epidemiology. One popular approach for building transmission networks is to reconstruct a phylogenetic tree using sequences from disease strains sampled from infected hosts and infer transmissions based on this tree. However, most existing phylogenetic approaches for transmission network inference are highly computationally intensive and cannot take within-host strain diversity into account. Here, we introduce a new phylogenetic approach for inferring transmission networks, TNet, that addresses these limitations. TNet uses multiple strain sequences from each sampled host to infer transmissions and is simpler and more accurate than existing approaches. Furthermore, TNet is highly scalable and able to distinguish between ambiguous and unambiguous transmission inferences. We evaluated TNet on a large collection of 560 simulated transmission networks of various sizes and diverse host, sequence, and transmission characteristics, as well as on 10 real transmission datasets with known transmission histories. Our results show that TNet outperforms two other recently developed methods, phyloscanner and SharpTNI, that also consider within-host strain diversity. We also applied TNet to a large collection of SARS-CoV-2 genomes sampled from infected individuals in many countries around the world, demonstrating how our inference framework can be adapted to accurately infer geographical transmission networks. TNet is freely available from https://compbio.engr.uconn.edu/software/TNet/.
Collapse
Affiliation(s)
- Saurav Dhar
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Chengchen Zhang
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Ion I. Măndoiu
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Mukul S. Bansal
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| |
Collapse
|
22
|
Orlovich Y, Kukharenko K, Kaibel V, Skums P. Scale-Free Spanning Trees and Their Application in Genomic Epidemiology. J Comput Biol 2021; 28:945-960. [PMID: 34491104 PMCID: PMC8670573 DOI: 10.1089/cmb.2020.0500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
We study the algorithmic problem of finding the most “scale-free-like” spanning tree of a connected graph. This problem is motivated by the fundamental problem of genomic epidemiology: given viral genomes sampled from infected individuals, reconstruct the transmission network (“who infected whom”). We use two possible objective functions for this problem and introduce the corresponding algorithmic problems termedm-SF (-scale free) ands-SF Spanning Tree problems. We prove that those problems are APX- and NP-hard, respectively, even in the classes of cubic and bipartite graphs. We propose two integer linear programming (ILP) formulations for thes-SF Spanning Tree problem, and experimentally assess its performance using simulated and experimental data. In particular, we demonstrate that the ILP-based approach allows for accurate reconstruction of transmission histories of several hepatitis C outbreaks.
Collapse
Affiliation(s)
- Yury Orlovich
- Faculty of Applied Mathematics and Computer Science, Belarusian State University, Minsk, Belarus
| | - Kirill Kukharenko
- Institute for Mathematical Optimization, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Volker Kaibel
- Institute for Mathematical Optimization, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
23
|
Hulseberg CE, Kumar R, Di Paola N, Larson P, Nagle ER, Richardson J, Hanson J, Wauquier N, Fair JN, Makuwa M, Mulembakani P, Muyembe-Tamfum JJ, Schoepp RJ, Sanchez-Lockhart M, Palacios GF, Kuhn JH, Kugelman JR. Molecular analysis of the 2012 Bundibugyo virus disease outbreak. Cell Rep Med 2021; 2:100351. [PMID: 34467242 PMCID: PMC8385243 DOI: 10.1016/j.xcrm.2021.100351] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 04/25/2021] [Accepted: 06/24/2021] [Indexed: 01/24/2023]
Abstract
Bundibugyo virus (BDBV) is one of four ebolaviruses known to cause disease in humans. Bundibugyo virus disease (BVD) outbreaks occurred in 2007-2008 in Bundibugyo District, Uganda, and in 2012 in Isiro, Province Orientale, Democratic Republic of the Congo. The 2012 BVD outbreak resulted in 38 laboratory-confirmed cases of human infection, 13 of whom died. However, only 4 BDBV specimens from the 2012 outbreak have been sequenced. Here, we provide BDBV sequences from seven additional patients. Analysis of the molecular epidemiology and evolutionary dynamics of the 2012 outbreak with these additional isolates challenges the current hypothesis that the outbreak was the result of a single spillover event. In addition, one patient record indicates that BDBV's initial emergence in Isiro occurred 50 days earlier than previously accepted. Collectively, this work demonstrates how retrospective sequencing can be used to elucidate outbreak origins and provide epidemiological contexts to a medically relevant pathogen.
Collapse
Affiliation(s)
- Christine E. Hulseberg
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Raina Kumar
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Nicholas Di Paola
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Peter Larson
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Elyse R. Nagle
- National Biodefense Analysis and Countermeasures Center, Frederick, MD 21702, USA
| | - Joshua Richardson
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Jarod Hanson
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Nadia Wauquier
- Metabiota, Inc., Kinshasa, Democratic Republic of the Congo
| | - Joseph N. Fair
- Metabiota, Inc., Kinshasa, Democratic Republic of the Congo
| | - Maria Makuwa
- Metabiota, Inc., Kinshasa, Democratic Republic of the Congo
| | | | | | - Randal J. Schoepp
- Diagnostic Systems Division, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Mariano Sanchez-Lockhart
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Gustavo F. Palacios
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Jens H. Kuhn
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Frederick, MD 21702, USA
| | - Jeffrey R. Kugelman
- Center for Genome Sciences, United States Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| |
Collapse
|
24
|
Wohl S, Giles JR, Lessler J. Sample size calculation for phylogenetic case linkage. PLoS Comput Biol 2021; 17:e1009182. [PMID: 34228722 PMCID: PMC8284614 DOI: 10.1371/journal.pcbi.1009182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 07/16/2021] [Accepted: 06/14/2021] [Indexed: 12/16/2022] Open
Abstract
Sample size calculations are an essential component of the design and evaluation of scientific studies. However, there is a lack of clear guidance for determining the sample size needed for phylogenetic studies, which are becoming an essential part of studying pathogen transmission. We introduce a statistical framework for determining the number of true infector-infectee transmission pairs identified by a phylogenetic study, given the size and population coverage of that study. We then show how characteristics of the criteria used to determine linkage and aspects of the study design can influence our ability to correctly identify transmission links, in sometimes counterintuitive ways. We test the overall approach using outbreak simulations and provide guidance for calculating the sensitivity and specificity of the linkage criteria, the key inputs to our approach. The framework is freely available as the R package phylosamp, and is broadly applicable to designing and evaluating a wide array of pathogen phylogenetic studies. Sequencing the genetic material of viral and bacterial pathogens has become an important part of tracking and combating human infectious diseases. Specifically, comparing the pathogen DNA or RNA sequences collected from infected individuals can allow researchers and public health experts to determine who infected whom, or detect when a pathogen entered a specific country or geographic area. However, it is often impossible to collect samples from every single infected person, and these missing sequences can pose problems for this type of analysis, especially if there is some bias behind which samples were selected for sequencing. We have developed a mathematical framework that allows users to determine the probability their conclusions about pathogen transmission are correct given the number and proportion of samples from a pathogen outbreak they have sequenced. This framework is freely available, easy to use, and broadly generalizable to any pathogen, and we hope that it can be used to inform the design and sampling strategies behind future sequencing-based studies.
Collapse
Affiliation(s)
- Shirlee Wohl
- Johns Hopkins Bloomberg School of Public Health, Department of Epidemiology, Baltimore, Maryland, United States of America
| | - John R Giles
- Johns Hopkins Bloomberg School of Public Health, Department of Epidemiology, Baltimore, Maryland, United States of America
| | - Justin Lessler
- Johns Hopkins Bloomberg School of Public Health, Department of Epidemiology, Baltimore, Maryland, United States of America
| |
Collapse
|
25
|
Gygli SM, Loiseau C, Jugheli L, Adamia N, Trauner A, Reinhard M, Ross A, Borrell S, Aspindzelashvili R, Maghradze N, Reither K, Beisel C, Tukvadze N, Avaliani Z, Gagneux S. Prisons as ecological drivers of fitness-compensated multidrug-resistant Mycobacterium tuberculosis. Nat Med 2021; 27:1171-1177. [PMID: 34031604 PMCID: PMC9400913 DOI: 10.1038/s41591-021-01358-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 04/19/2021] [Indexed: 02/04/2023]
Abstract
Multidrug-resistant tuberculosis (MDR-TB) accounts for one third of the annual deaths due to antimicrobial resistance1. Drug resistance-conferring mutations frequently cause fitness costs in bacteria2-5. Experimental work indicates that these drug resistance-related fitness costs might be mitigated by compensatory mutations6-10. However, the clinical relevance of compensatory evolution remains poorly understood. Here we show that, in the country of Georgia, during a 6-year nationwide study, 63% of MDR-TB was due to patient-to-patient transmission. Compensatory mutations and patient incarceration were independently associated with transmission. Furthermore, compensatory mutations were overrepresented among isolates from incarcerated individuals that also frequently spilled over into the non-incarcerated population. As a result, up to 31% of MDR-TB in Georgia was directly or indirectly linked to prisons. We conclude that prisons fuel the epidemic of MDR-TB in Georgia by acting as ecological drivers of fitness-compensated strains with high transmission potential.
Collapse
Affiliation(s)
- Sebastian M. Gygli
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland.,These authors contributed equally: Sebastian M. Gygli, Chloé Loiseau
| | - Chloé Loiseau
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland.,These authors contributed equally: Sebastian M. Gygli, Chloé Loiseau
| | - Levan Jugheli
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland.,National Center for Tuberculosis and Lung Diseases (NCTLD), Tbilisi, Georgia
| | - Natia Adamia
- National Center for Tuberculosis and Lung Diseases (NCTLD), Tbilisi, Georgia
| | - Andrej Trauner
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland
| | - Miriam Reinhard
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland
| | - Amanda Ross
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland
| | - Sonia Borrell
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland
| | | | - Nino Maghradze
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland.,National Center for Tuberculosis and Lung Diseases (NCTLD), Tbilisi, Georgia
| | - Klaus Reither
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland
| | - Christian Beisel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Nestani Tukvadze
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland.,National Center for Tuberculosis and Lung Diseases (NCTLD), Tbilisi, Georgia
| | - Zaza Avaliani
- National Center for Tuberculosis and Lung Diseases (NCTLD), Tbilisi, Georgia
| | - Sebastien Gagneux
- Swiss Tropical and Public Health Institute, Basel, Switzerland.,University of Basel, Basel, Switzerland.,Correspondence and requests for materials should be addressed to S.G.
| |
Collapse
|
26
|
Winglee K, McDaniel CJ, Linde L, Kammerer S, Cilnis M, Raz KM, Noboa W, Knorr J, Cowan L, Reynolds S, Posey J, Sullivan Meissner J, Poonja S, Shaw T, Talarico S, Silk BJ. Logically Inferred Tuberculosis Transmission (LITT): A Data Integration Algorithm to Rank Potential Source Cases. Front Public Health 2021; 9:667337. [PMID: 34235130 PMCID: PMC8255782 DOI: 10.3389/fpubh.2021.667337] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 05/10/2021] [Indexed: 11/22/2022] Open
Abstract
Understanding tuberculosis (TB) transmission chains can help public health staff target their resources to prevent further transmission, but currently there are few tools to automate this process. We have developed the Logically Inferred Tuberculosis Transmission (LITT) algorithm to systematize the integration and analysis of whole-genome sequencing, clinical, and epidemiological data. Based on the work typically performed by hand during a cluster investigation, LITT identifies and ranks potential source cases for each case in a TB cluster. We evaluated LITT using a diverse dataset of 534 cases in 56 clusters (size range: 2–69 cases), which were investigated locally in three different U.S. jurisdictions. Investigators and LITT agreed on the most likely source case for 145 (80%) of 181 cases. By reviewing discrepancies, we found that many of the remaining differences resulted from errors in the dataset used for the LITT algorithm. In addition, we developed a graphical user interface, user's manual, and training resources to improve LITT accessibility for frontline staff. While LITT cannot replace thorough field investigation, the algorithm can help investigators systematically analyze and interpret complex data over the course of a TB cluster investigation. Code available at:https://github.com/CDCgov/TB_molecular_epidemiology/tree/1.0; https://zenodo.org/badge/latestdoi/166261171.
Collapse
Affiliation(s)
- Kathryn Winglee
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Clinton J McDaniel
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Lauren Linde
- TB Control Branch, California Department of Public Health, Richmond, CA, United States
| | - Steve Kammerer
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Martin Cilnis
- TB Control Branch, California Department of Public Health, Richmond, CA, United States
| | - Kala M Raz
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Wendy Noboa
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States.,Los Angeles County Department of Public Health, Los Angeles, CA, United States
| | - Jillian Knorr
- New York City Department of Health and Mental Hygiene, Queens, NY, United States
| | - Lauren Cowan
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Sue Reynolds
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - James Posey
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | - Shameer Poonja
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States.,Los Angeles County Department of Public Health, Los Angeles, CA, United States
| | - Tambi Shaw
- TB Control Branch, California Department of Public Health, Richmond, CA, United States
| | - Sarah Talarico
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| | - Benjamin J Silk
- Division of Tuberculosis Elimination, Centers for Disease Control and Prevention, Atlanta, GA, United States
| |
Collapse
|
27
|
Dawson D, Rasmussen D, Peng X, Lanzas C. Inferring environmental transmission using phylodynamics: a case-study using simulated evolution of an enteric pathogen. J R Soc Interface 2021; 18:20210041. [PMID: 34102084 DOI: 10.1098/rsif.2021.0041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Indirect (environmental) and direct (host-host) transmission pathways cannot easily be distinguished when they co-occur in epidemics, particularly when they occur on similar time scales. Phylodynamic reconstruction is a potential approach to this problem that combines epidemiological information (temporal, spatial information) with pathogen whole-genome sequencing data to infer transmission trees of epidemics. However, factors such as differences in mutation and transmission rates between host and non-host environments may obscure phylogenetic inference from these methods. In this study, we used a network-based transmission model that explicitly models pathogen evolution to simulate epidemics with both direct and indirect transmission. Epidemics were simulated according to factorial combinations of direct/indirect transmission proportions, host mutation rates and conditions of environmental pathogen growth. Transmission trees were then reconstructed using the phylodynamic approach SCOTTI (structured coalescent transmission tree inference) and evaluated. We found that although insufficient diversity sets a lower bound on when accurate phylodynamic inferences can be made, transmission routes and assumed pathogen lifestyle affected pathogen population structure and subsequently influenced both reconstruction success and the likelihood of direct versus indirect pathways being reconstructed. We conclude that prior knowledge of the likely ecology and population structure of pathogens in host and non-host environments is critical to fully using phylodynamic techniques.
Collapse
Affiliation(s)
- Daniel Dawson
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA
| | - David Rasmussen
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.,Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, USA
| | - Xinxia Peng
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.,Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA
| | - Cristina Lanzas
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
28
|
Quantifying transmission fitness costs of multi-drug resistant tuberculosis. Epidemics 2021; 36:100471. [PMID: 34256273 DOI: 10.1016/j.epidem.2021.100471] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 01/14/2020] [Accepted: 05/17/2021] [Indexed: 11/22/2022] Open
Abstract
As multi-drug resistant tuberculosis (MDR-TB) continues to spread, investigating the transmission potential of different drug-resistant strains becomes an ever more pressing topic in public health. While phylogenetic and transmission tree inferences provide valuable insight into possible transmission chains, phylodynamic inference combines evolutionary and epidemiological analyses to estimate the parameters of the underlying epidemiological processes, allowing us to describe the overall dynamics of disease spread in the population. In this study, we introduce an approach to Mycobacterium tuberculosis (M. tuberculosis) phylodynamic analysis employing an existing computationally efficient model to quantify the transmission fitness costs of drug resistance with respect to drug-sensitive strains. To determine the accuracy and precision of our approach, we first perform a simulation study, mimicking the simultaneous spread of drug-sensitive and drug-resistant tuberculosis (TB) strains. We analyse the simulated transmission trees using the phylodynamic multi-type birth-death model (MTBD, (Kühnert et al., 2016)) within the BEAST2 framework and show that this model can estimate the parameters of the epidemic well, despite the simplifying assumptions that MTBD makes compared to the complex TB transmission dynamics used for simulation. We then apply the MTBD model to an M. tuberculosis lineage 4 dataset that primarily consists of MDR sequences. Some of the MDR strains additionally exhibit resistance to pyrazinamide - an important first-line anti-tuberculosis drug. Our results support the previously proposed hypothesis that pyrazinamide resistance confers a transmission fitness cost to the bacterium, which we quantify for the given dataset. Importantly, our sensitivity analyses show that the estimates are robust to different prior distributions on the resistance acquisition rate, but are affected by the size of the dataset - i.e. we estimate a higher fitness cost when using fewer sequences for analysis. Overall, we propose that MTBD can be used to quantify the transmission fitness cost for a wide range of pathogens where the strains can be appropriately divided into two or more categories with distinct properties.
Collapse
|
29
|
White LF, Moser CB, Thompson RN, Pagano M. Statistical Estimation of the Reproductive Number From Case Notification Data. Am J Epidemiol 2021; 190:611-620. [PMID: 33034345 DOI: 10.1093/aje/kwaa211] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 09/24/2020] [Accepted: 10/02/2020] [Indexed: 12/20/2022] Open
Abstract
The reproductive number, or reproduction number, is a valuable metric in understanding infectious disease dynamics. There is a large body of literature related to its use and estimation. In the last 15 years, there has been tremendous progress in statistically estimating this number using case notification data. These approaches are appealing because they are relevant in an ongoing outbreak (e.g., for assessing the effectiveness of interventions) and do not require substantial modeling expertise to be implemented. In this article, we describe these methods and the extensions that have been developed. We provide insight into the distinct interpretations of the estimators proposed and provide real data examples to illustrate how they are implemented. Finally, we conclude with a discussion of available software and opportunities for future development.
Collapse
|
30
|
Leavitt SV, Lee RS, Sebastiani P, Horsburgh CR, Jenkins HE, White LF. Estimating the relative probability of direct transmission between infectious disease patients. Int J Epidemiol 2021; 49:764-775. [PMID: 32211747 DOI: 10.1093/ije/dyaa031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 02/07/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Estimating infectious disease parameters such as the serial interval (time between symptom onset in primary and secondary cases) and reproductive number (average number of secondary cases produced by a primary case) are important in understanding infectious disease dynamics. Many estimation methods require linking cases by direct transmission, a difficult task for most diseases. METHODS Using a subset of cases with detailed genetic and/or contact investigation data to develop a training set of probable transmission events, we build a model to estimate the relative transmission probability for all case-pairs from demographic, spatial and clinical data. Our method is based on naive Bayes, a machine learning classification algorithm which uses the observed frequencies in the training dataset to estimate the probability that a pair is linked given a set of covariates. RESULTS In simulations, we find that the probabilities estimated using genetic distance between cases to define training transmission events are able to distinguish between truly linked and unlinked pairs with high accuracy (area under the receiver operating curve value of 95%). Additionally, only a subset of the cases, 10-50% depending on sample size, need to have detailed genetic data for our method to perform well. We show how these probabilities can be used to estimate the average effective reproductive number and apply our method to a tuberculosis outbreak in Hamburg, Germany. CONCLUSIONS Our method is a novel way to infer transmission dynamics in any dataset when only a subset of cases has rich contact investigation and/or genetic data.
Collapse
Affiliation(s)
- Sarah V Leavitt
- School of Public Health, Department of Biostatistics, Boston University, Boston, MA, USA
| | - Robyn S Lee
- Harvard T.H. Chan School of Public Health, Boston, MA, USA.,University of Toronto Dalla Lana School of Public Health Epidemiology Division, Toronto, ON, Canada
| | - Paola Sebastiani
- School of Public Health, Department of Biostatistics, Boston University, Boston, MA, USA
| | - C Robert Horsburgh
- School of Public Health, Department of Epidemiology, Boston University, Boston, MA, USA
| | - Helen E Jenkins
- School of Public Health, Department of Biostatistics, Boston University, Boston, MA, USA
| | - Laura F White
- School of Public Health, Department of Biostatistics, Boston University, Boston, MA, USA
| |
Collapse
|
31
|
Didelot X, Kendall M, Xu Y, White PJ, McCarthy N. Genomic Epidemiology Analysis of Infectious Disease Outbreaks Using TransPhylo. Curr Protoc 2021; 1:e60. [PMID: 33617114 PMCID: PMC7995038 DOI: 10.1002/cpz1.60] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Comparing the pathogen genomes from several cases of an infectious disease has the potential to help us understand and control outbreaks. Many methods exist to reconstruct a phylogeny from such genomes, which represents how the genomes are related to one another. However, such a phylogeny is not directly informative about transmission events between individuals. TransPhylo is a software tool implemented as an R package designed to bridge the gap between pathogen phylogenies and transmission trees. TransPhylo is based on a combined model of transmission between hosts and pathogen evolution within each host. It can simulate both phylogenies and transmission trees jointly under this combined model. TransPhylo can also reconstruct a transmission tree based on a dated phylogeny, by exploring the space of transmission trees compatible with the phylogeny. A transmission tree can be represented as a coloring of a phylogeny where each color represents a different host of the pathogen, and TransPhylo provides convenient ways to plot these colorings and explore the results. This article presents the basic protocols that can be used to make the most of TransPhylo. © 2021 The Authors. Basic Protocol 1: First steps with TransPhylo Basic Protocol 2: Simulation of outbreak data Basic Protocol 3: Inference of transmission Basic Protocol 4: Exploring the results of inference.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences and Department of StatisticsUniversity of WarwickUnited Kingdom
| | - Michelle Kendall
- School of Life Sciences and Department of StatisticsUniversity of WarwickUnited Kingdom
| | - Yuanwei Xu
- Center for Computational Biology, Institute of Cancer and Genomic SciencesUniversity of BirminghamUnited Kingdom
| | - Peter J. White
- Department of Infectious Disease Epidemiology, School of Public HealthImperial College LondonUnited Kingdom
- Medical Research Council Centre for Global Infectious Disease Analysis, School of Public HealthImperial College LondonUnited Kingdom
- National Institute for Health Research Health Protection Research Unit in Modelling and Health Economics, School of Public HealthImperial College LondonUnited Kingdom
- Modelling and Economics Unit, National Infection ServicePublic Health EnglandLondonUnited Kingdom
| | - Noel McCarthy
- Warwick Medical SchoolUniversity of WarwickUnited Kingdom
| |
Collapse
|
32
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
33
|
Montazeri H, Little S, Legha MM, Beerenwinkel N, DeGruttola V. Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times. Stat Appl Genet Mol Biol 2020; 19:/j/sagmb.ahead-of-print/sagmb-2019-0026/sagmb-2019-0026.xml. [PMID: 33085643 PMCID: PMC8212962 DOI: 10.1515/sagmb-2019-0026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 09/16/2020] [Indexed: 11/15/2022]
Abstract
Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.
Collapse
Affiliation(s)
- Hesam Montazeri
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Susan Little
- Department of Medicine, University of California San Diego, California, USA
| | - Mozhgan Mozaffari Legha
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
34
|
Boskova V, Stadler T. PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences. Mol Biol Evol 2020; 37:3061-3075. [PMID: 32492139 PMCID: PMC7530608 DOI: 10.1093/molbev/msaa136] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.
Collapse
Affiliation(s)
- Veronika Boskova
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Switzerland
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
35
|
What Should Health Departments Do with HIV Sequence Data? Viruses 2020; 12:v12091018. [PMID: 32932642 PMCID: PMC7551807 DOI: 10.3390/v12091018] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/09/2020] [Accepted: 09/11/2020] [Indexed: 11/27/2022] Open
Abstract
Many countries and US states have mandatory statues that require reporting of HIV clinical data including genetic sequencing results to the public health departments. Because genetic sequencing is a part of routine care for HIV infected persons, health departments have extensive sequence collections spanning years and even decades of the HIV epidemic. How should these data be used (or not) in public health practice? This is a complex, multi-faceted question that weighs personal risks against public health benefit. The answer is neither straightforward nor universal. However, to make that judgement—of how genetic sequence data should be used in describing and combating the HIV epidemic—we need a clear image of what a phylogenetically enhanced HIV surveillance system can do and what benefit it might provide. In this paper, we present a positive case for how up-to-date analysis of HIV sequence databases managed by health departments can provide unique and actionable information of how HIV is spreading in local communities. We discuss this question broadly, with examples from the US, as it is globally relevant for all health authorities that collect HIV genetic data.
Collapse
|
36
|
Abstract
Infectious disease research spans scales from the molecular to the global—from specific mechanisms of pathogen drug resistance, virulence, and replication to the movement of people, animals, and pathogens around the world. All of these research areas have been impacted by the recent growth of large-scale data sources and data analytics. Some of these advances rely on data or analytic methods that are common to most biomedical data science, while others leverage the unique nature of infectious disease, namely its communicability. This review outlines major research progress in the past few years and highlights some remaining opportunities, focusing on data or methodological approaches particular to infectious disease.
Collapse
Affiliation(s)
- Peter M. Kasson
- Department of Biomedical Engineering and Department of Molecular Physiology, University of Virginia, Charlottesville, Virginia 22908, USA
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 752 37 Uppsala, Sweden
| |
Collapse
|
37
|
Firestone SM, Hayama Y, Lau MSY, Yamamoto T, Nishi T, Bradhurst RA, Demirhan H, Stevenson MA, Tsutsui T. Transmission network reconstruction for foot-and-mouth disease outbreaks incorporating farm-level covariates. PLoS One 2020; 15:e0235660. [PMID: 32667952 PMCID: PMC7363093 DOI: 10.1371/journal.pone.0235660] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 06/22/2020] [Indexed: 11/19/2022] Open
Abstract
Transmission network modelling to infer ‘who infected whom’ in infectious disease outbreaks is a highly active area of research. Outbreaks of foot-and-mouth disease have been a key focus of transmission network models that integrate genomic and epidemiological data. The aim of this study was to extend Lau’s systematic Bayesian inference framework to incorporate additional parameters representing predominant species and numbers of animals held on a farm. Lau’s Bayesian Markov chain Monte Carlo algorithm was reformulated, verified and pseudo-validated on 100 simulated outbreaks populated with demographic data Japan and Australia. The modified model was then implemented on genomic and epidemiological data from the 2010 outbreak of foot-and-mouth disease in Japan, and outputs compared to those from the SCOTTI model implemented in BEAST2. The modified model achieved improvements in overall accuracy when tested on the simulated outbreaks. When implemented on the actual outbreak data from Japan, infected farms that held predominantly pigs were estimated to have five times the transmissibility of infected cattle farms and be 49% less susceptible. The farm-level incubation period was 1 day shorter than the latent period, the timing of the seeding of the outbreak in Japan was inferred, as were key linkages between clusters and features of farms involved in widespread dissemination of this outbreak. To improve accessibility the modified model has been implemented as the R package ‘BORIS’ for use in future outbreaks.
Collapse
Affiliation(s)
- Simon M. Firestone
- Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, Australia
- * E-mail:
| | - Yoko Hayama
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, Japan
| | - Max S. Y. Lau
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America
| | - Takehisa Yamamoto
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, Japan
| | - Tatsuya Nishi
- Exotic Disease Research Station, National Institute of Animal Health, National Agriculture and Food Research Organization, Kodaira, Tokyo, Japan
| | - Richard A. Bradhurst
- Centre of Excellence for Biosecurity Risk Analysis, The University of Melbourne, Parkville, VIC, Australia
| | - Haydar Demirhan
- Mathematical Sciences Discipline, School of Science, RMIT University, Melbourne, VIC, Australia
| | - Mark A. Stevenson
- Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Toshiyuki Tsutsui
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, Japan
| |
Collapse
|
38
|
Llanes A, Restrepo CM, Caballero Z, Rajeev S, Kennedy MA, Lleonart R. Betacoronavirus Genomes: How Genomic Information has been Used to Deal with Past Outbreaks and the COVID-19 Pandemic. Int J Mol Sci 2020; 21:E4546. [PMID: 32604724 PMCID: PMC7352669 DOI: 10.3390/ijms21124546] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 06/22/2020] [Accepted: 06/23/2020] [Indexed: 12/22/2022] Open
Abstract
In the 21st century, three highly pathogenic betacoronaviruses have emerged, with an alarming rate of human morbidity and case fatality. Genomic information has been widely used to understand the pathogenesis, animal origin and mode of transmission of coronaviruses in the aftermath of the 2002-2003 severe acute respiratory syndrome (SARS) and 2012 Middle East respiratory syndrome (MERS) outbreaks. Furthermore, genome sequencing and bioinformatic analysis have had an unprecedented relevance in the battle against the 2019-2020 coronavirus disease 2019 (COVID-19) pandemic, the newest and most devastating outbreak caused by a coronavirus in the history of mankind. Here, we review how genomic information has been used to tackle outbreaks caused by emerging, highly pathogenic, betacoronavirus strains, emphasizing on SARS-CoV, MERS-CoV and SARS-CoV-2. We focus on shared genomic features of the betacoronaviruses and the application of genomic information to phylogenetic analysis, molecular epidemiology and the design of diagnostic systems, potential drugs and vaccine candidates.
Collapse
Affiliation(s)
- Alejandro Llanes
- Centro de Biología Celular y Molecular de Enfermedades, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City 0801, Panama; (A.L.); (C.M.R.); (Z.C.)
| | - Carlos M. Restrepo
- Centro de Biología Celular y Molecular de Enfermedades, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City 0801, Panama; (A.L.); (C.M.R.); (Z.C.)
| | - Zuleima Caballero
- Centro de Biología Celular y Molecular de Enfermedades, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City 0801, Panama; (A.L.); (C.M.R.); (Z.C.)
| | - Sreekumari Rajeev
- College of Veterinary Medicine, University of Florida, Gainesville, FL 32610, USA;
| | - Melissa A. Kennedy
- College of Veterinary Medicine, University of Tennessee, Knoxville, TN 37996, USA;
| | - Ricardo Lleonart
- Centro de Biología Celular y Molecular de Enfermedades, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City 0801, Panama; (A.L.); (C.M.R.); (Z.C.)
| |
Collapse
|
39
|
Cassidy R, Kypraios T, O'Neill PD. Modelling, Bayesian inference, and model assessment for nosocomial pathogens using whole-genome-sequence data. Stat Med 2020; 39:1746-1765. [PMID: 32142587 PMCID: PMC7217057 DOI: 10.1002/sim.8510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 01/15/2020] [Accepted: 01/31/2020] [Indexed: 12/28/2022]
Abstract
Whole‐genome sequencing of pathogens in outbreaks of infectious disease provides the potential to reconstruct transmission pathways and enhance the information contained in conventional epidemiological data. In recent years, there have been numerous new methods and models developed to exploit such high‐resolution genetic data. However, corresponding methods for model assessment have been largely overlooked. In this article, we develop both new modelling methods and new model assessment methods, specifically by building on the work of Worby et al. Although the methods are generic in nature, we focus specifically on nosocomial pathogens and analyze a dataset collected during an outbreak of MRSA in a hospital setting.
Collapse
Affiliation(s)
- Rosanna Cassidy
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
| | - Theodore Kypraios
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
| | - Philip D O'Neill
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
| |
Collapse
|
40
|
Polonsky JA, Baidjoe A, Kamvar ZN, Cori A, Durski K, Edmunds WJ, Eggo RM, Funk S, Kaiser L, Keating P, de Waroux OLP, Marks M, Moraga P, Morgan O, Nouvellet P, Ratnayake R, Roberts CH, Whitworth J, Jombart T. Outbreak analytics: a developing data science for informing the response to emerging pathogens. Philos Trans R Soc Lond B Biol Sci 2020; 374:20180276. [PMID: 31104603 PMCID: PMC6558557 DOI: 10.1098/rstb.2018.0276] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Despite continued efforts to improve health systems worldwide, emerging pathogen epidemics remain a major public health concern. Effective response to such outbreaks relies on timely intervention, ideally informed by all available sources of data. The collection, visualization and analysis of outbreak data are becoming increasingly complex, owing to the diversity in types of data, questions and available methods to address them. Recent advances have led to the rise of outbreak analytics, an emerging data science focused on the technological and methodological aspects of the outbreak data pipeline, from collection to analysis, modelling and reporting to inform outbreak response. In this article, we assess the current state of the field. After laying out the context of outbreak response, we critically review the most common analytics components, their inter-dependencies, data requirements and the type of information they can provide to inform operations in real time. We discuss some challenges and opportunities and conclude on the potential role of outbreak analytics for improving our understanding of, and response to outbreaks of emerging pathogens. This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control‘. This theme issue is linked with the earlier issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’.
Collapse
Affiliation(s)
- Jonathan A Polonsky
- 1 Department of Health Emergency Information and Risk Assessment, World Health Organization , Avenue Appia 20, 1211 Geneva , Switzerland.,3 Faculty of Medicine, University of Geneva , 1 rue Michel-Servet, 1211 Geneva , Switzerland
| | - Amrish Baidjoe
- 4 Department of Infectious Disease Epidemiology, School of Public Health, MRC Centre for Global Infectious Disease Analysis, Imperial College London , Medical School Building, St Mary's Campus, Norfolk Place London W2 1PG , UK
| | - Zhian N Kamvar
- 4 Department of Infectious Disease Epidemiology, School of Public Health, MRC Centre for Global Infectious Disease Analysis, Imperial College London , Medical School Building, St Mary's Campus, Norfolk Place London W2 1PG , UK
| | - Anne Cori
- 4 Department of Infectious Disease Epidemiology, School of Public Health, MRC Centre for Global Infectious Disease Analysis, Imperial College London , Medical School Building, St Mary's Campus, Norfolk Place London W2 1PG , UK
| | - Kara Durski
- 2 Department of Infectious Hazard Management, World Health Organization , Avenue Appia 20, 1211 Geneva , Switzerland
| | - W John Edmunds
- 5 Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK.,6 Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK
| | - Rosalind M Eggo
- 5 Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK.,6 Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK
| | - Sebastian Funk
- 5 Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK.,6 Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK
| | - Laurent Kaiser
- 3 Faculty of Medicine, University of Geneva , 1 rue Michel-Servet, 1211 Geneva , Switzerland
| | - Patrick Keating
- 5 Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK.,8 UK Public Health Rapid Support Team , London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT , UK
| | - Olivier le Polain de Waroux
- 5 Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK.,8 UK Public Health Rapid Support Team , London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT , UK.,9 Public Health England , Wellington House, 133-155 Waterloo Road, London SE1 8UG , UK
| | - Michael Marks
- 7 Clinical Research Department, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK
| | - Paula Moraga
- 10 Centre for Health Informatics, Computing and Statistics (CHICAS), Lancaster Medical School, Lancaster University , Lancaster LA1 4YW , UK
| | - Oliver Morgan
- 1 Department of Health Emergency Information and Risk Assessment, World Health Organization , Avenue Appia 20, 1211 Geneva , Switzerland
| | - Pierre Nouvellet
- 4 Department of Infectious Disease Epidemiology, School of Public Health, MRC Centre for Global Infectious Disease Analysis, Imperial College London , Medical School Building, St Mary's Campus, Norfolk Place London W2 1PG , UK.,11 School of Life Sciences, University of Sussex , Sussex House, Brighton BN1 9RH , UK
| | - Ruwan Ratnayake
- 5 Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK.,6 Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK
| | - Chrissy H Roberts
- 7 Clinical Research Department, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK
| | - Jimmy Whitworth
- 5 Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK.,8 UK Public Health Rapid Support Team , London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT , UK
| | - Thibaut Jombart
- 4 Department of Infectious Disease Epidemiology, School of Public Health, MRC Centre for Global Infectious Disease Analysis, Imperial College London , Medical School Building, St Mary's Campus, Norfolk Place London W2 1PG , UK.,5 Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine , Keppel St, London WC1E 7HT , UK.,8 UK Public Health Rapid Support Team , London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT , UK
| |
Collapse
|
41
|
Abstract
PURPOSE OF REVIEW Within-host diversity complicates transmission models because it recognizes that between-host virus phylogenies are not identical to the transmission history among the infected hosts. This review presents the biological and theoretical foundations for recent development in this field, and shows that modern phylodynamic methods are capable of inferring realistic transmission histories from HIV sequence data. RECENT FINDINGS Transmission of single or multiple genetic variants from a donor's HIV population results in donor-recipient phylogenies with combinations of monophyletic, paraphyletic, and polyphyletic patterns. Large-scale simulations and analyses of many real HIV datasets have established that transmission direction, directness, or common source often can be inferred based on HIV sequence data. Phylodynamic reconstruction of HIV transmissions that include within-host HIV diversity have recently been established and made available in several software packages. SUMMARY Phylodynamic methods that include realistic features of HIV genetic diversification have come of age, significantly improving inference of key epidemiological parameters. This opens the door to more accurate surveillance and better-informed prevention campaigns.
Collapse
|
42
|
Hayati M, Biller P, Colijn C. Predicting the short-term success of human influenza virus variants with machine learning. Proc Biol Sci 2020; 287:20200319. [PMID: 32259469 PMCID: PMC7209065 DOI: 10.1098/rspb.2020.0319] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 03/16/2020] [Indexed: 12/13/2022] Open
Abstract
Seasonal influenza viruses are constantly changing and produce a different set of circulating strains each season. Small genetic changes can accumulate over time and result in antigenically different viruses; this may prevent the body's immune system from recognizing those viruses. Due to rapid mutations, in particular, in the haemagglutinin (HA) gene, seasonal influenza vaccines must be updated frequently. This requires choosing strains to include in the updates to maximize the vaccines' benefits, according to estimates of which strains will be circulating in upcoming seasons. This is a challenging prediction task. In this paper, we use longitudinally sampled phylogenetic trees based on HA sequences from human influenza viruses, together with counts of epitope site polymorphisms in HA, to predict which influenza virus strains are likely to be successful. We extract small groups of taxa (subtrees) and use a suite of features of these subtrees as key inputs to the machine learning tools. Using a range of training and testing strategies, including training on H3N2 and testing on H1N1, we find that successful prediction of future expansion of small subtrees is possible from these data, with accuracies of 0.71-0.85 and a classifier 'area under the curve' 0.75-0.9.
Collapse
Affiliation(s)
- Maryam Hayati
- Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
| | - Priscila Biller
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
- Department of Mathematics, Imperial College London, London SW7 2BU, UK
| |
Collapse
|
43
|
Numminen E, Laine AL. The spread of a wild plant pathogen is driven by the road network. PLoS Comput Biol 2020; 16:e1007703. [PMID: 32231370 PMCID: PMC7108725 DOI: 10.1371/journal.pcbi.1007703] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 01/31/2020] [Indexed: 12/12/2022] Open
Abstract
Spatial analyses of pathogen occurrence in their natural surroundings entail unique opportunities for assessing in vivo drivers of disease epidemiology. Such studies are however confronted by the complexity of the landscape driving epidemic spread and disease persistence. Since relevant information on how the landscape influences epidemiological dynamics is rarely available, simple spatial models of spread are often used. In the current study we demonstrate both how more complex transmission pathways could be incorpoted to epidemiological analyses and how this can offer novel insights into understanding disease spread across the landscape. Our study is focused on Podosphaera plantaginis, a powdery mildew pathogen that transmits from one host plant to another by wind-dispersed spores. Its host populations often reside next to roads and thus we hypothesize that the road network influences the epidemiology of P. plantaginis. To analyse the impact of roads on the transmission dynamics, we consider a spatial dataset on the presence-absence records on the pathogen collected from a fragmented landscape of host populations. Using both mechanistic transmission modeling and statistical modeling with road-network summary statistics as predictors, we conclude the evident role of the road network in the progression of the epidemics: a phenomena which is manifested both in the enhanced transmission along the roads and in infections typically occurring at the central hub locations of the road network. We also demonstrate how the road network affects the spread of the pathogen using simulations. Jointly our results highlight how human alteration of natural landscapes may increase disease spread.
Collapse
Affiliation(s)
- Elina Numminen
- Research Centre for Ecological Change, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Anna-Liisa Laine
- Research Centre for Ecological Change, University of Helsinki, Helsinki, Finland
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| |
Collapse
|
44
|
Xu Y, Cancino-Muñoz I, Torres-Puente M, Villamayor LM, Borrás R, Borrás-Máñez M, Bosque M, Camarena JJ, Colomer-Roig E, Colomina J, Escribano I, Esparcia-Rodríguez O, Gil-Brusola A, Gimeno C, Gimeno-Gascón A, Gomila-Sard B, González-Granda D, Gonzalo-Jiménez N, Guna-Serrano MR, López-Hontangas JL, Martín-González C, Moreno-Muñoz R, Navarro D, Navarro M, Orta N, Pérez E, Prat J, Rodríguez JC, Ruiz-García MM, Vanaclocha H, Colijn C, Comas I. High-resolution mapping of tuberculosis transmission: Whole genome sequencing and phylogenetic modelling of a cohort from Valencia Region, Spain. PLoS Med 2019; 16:e1002961. [PMID: 31671150 PMCID: PMC6822721 DOI: 10.1371/journal.pmed.1002961] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 10/07/2019] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Whole genome sequencing provides better delineation of transmission clusters in Mycobacterium tuberculosis than traditional methods. However, its ability to reveal individual transmission links within clusters is limited. Here, we used a 2-step approach based on Bayesian transmission reconstruction to (1) identify likely index and missing cases, (2) determine risk factors associated with transmitters, and (3) estimate when transmission happened. METHODS AND FINDINGS We developed our transmission reconstruction method using genomic and epidemiological data from a population-based study from Valencia Region, Spain. Tuberculosis (TB) incidence during the study period was 8.4 cases per 100,000 people. While the study is ongoing, the sampling frame for this work includes notified TB cases between 1 January 2014 and 31 December 2016. We identified a total of 21 transmission clusters that fulfilled the criteria for analysis. These contained a total of 117 individuals diagnosed with active TB (109 with epidemiological data). Demographic characteristics of the study population were as follows: 80/109 (73%) individuals were Spanish-born, 76/109 (70%) individuals were men, and the mean age was 42.51 years (SD 18.46). We found that 66/109 (61%) TB patients were sputum positive at diagnosis, and 10/109 (9%) were HIV positive. We used the data to reveal individual transmission links, and to identify index cases, missing cases, likely transmitters, and associated transmission risk factors. Our Bayesian inference approach suggests that at least 60% of index cases are likely misidentified by local public health. Our data also suggest that factors associated with likely transmitters are different to those of simply being in a transmission cluster, highlighting the importance of differentiating between these 2 phenomena. Our data suggest that type 2 diabetes mellitus is a risk factor associated with being a transmitter (odds ratio 0.19 [95% CI 0.02-1.10], p < 0.003). Finally, we used the most likely timing for transmission events to study when TB transmission occurred; we identified that 5/14 (35.7%) cases likely transmitted TB well before symptom onset, and these were largely sputum negative at diagnosis. Limited within-cluster diversity does not allow us to extrapolate our findings to the whole TB population in Valencia Region. CONCLUSIONS In this study, we found that index cases are often misidentified, with downstream consequences for epidemiological investigations because likely transmitters can be missed. Our findings regarding inferred transmission timing suggest that TB transmission can occur before patient symptom onset, suggesting also that TB transmits during sub-clinical disease. This result has direct implications for diagnosing TB and reducing transmission. Overall, we show that a transition to individual-based genomic epidemiology will likely close some of the knowledge gaps in TB transmission and may redirect efforts towards cost-effective contact investigations for improved TB control.
Collapse
Affiliation(s)
- Yuanwei Xu
- Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, United Kingdom
| | - Irving Cancino-Muñoz
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Valencia, Spain
| | - Manuela Torres-Puente
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Valencia, Spain
| | | | - Rafael Borrás
- Microbiology Service, Hospital Clínico Universitario, Valencia, Spain
| | - María Borrás-Máñez
- Microbiology and Parasitology Service, Hospital Universitario de La Ribera, Alzira, Spain
| | | | - Juan J. Camarena
- Microbiology Service, Hospital Universitario Dr. Peset, Valencia, Spain
| | - Ester Colomer-Roig
- Genomics and Health Unit, FISABIO Public Health, Valencia, Spain
- Microbiology Service, Hospital Universitario Dr. Peset, Valencia, Spain
| | - Javier Colomina
- Microbiology and Parasitology Service, Hospital Universitario de La Ribera, Alzira, Spain
| | - Isabel Escribano
- Microbiology Laboratory, Hospital Virgen de los Lírios, Alcoy, Spain
| | | | - Ana Gil-Brusola
- Microbiology Service, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Concepción Gimeno
- Microbiology Service, Hospital General Universitario de Valencia, Valencia, Spain
| | | | - Bárbara Gomila-Sard
- Microbiology Service, Hospital General Universitario de Castellón, Castellon, Spain
| | | | | | | | | | - Coral Martín-González
- Microbiology Service, Hospital Universitario de San Juan de Alicante, Alicante, Spain
| | - Rosario Moreno-Muñoz
- Microbiology Service, Hospital General Universitario de Castellón, Castellon, Spain
| | - David Navarro
- Microbiology Service, Hospital Clínico Universitario, Valencia, Spain
| | - María Navarro
- Microbiology Service, Hospital de la Vega Baixa, Orihuela, Spain
| | - Nieves Orta
- Microbiology Service, Hospital San Francesc de Borja, Gandía, Spain
| | - Elvira Pérez
- Subdirección General de Epidemiología y Vigilancia de la Salud, Dirección General de Salud Pública, Valencia, Spain
| | - Josep Prat
- Microbiology Service, Hospital de Sagunto, Sagunto, Spain
| | | | | | - Herme Vanaclocha
- Subdirección General de Epidemiología y Vigilancia de la Salud, Dirección General de Salud Pública, Valencia, Spain
| | - Caroline Colijn
- Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, United Kingdom
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
- * E-mail: (CC); (IC)
| | - Iñaki Comas
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Valencia, Spain
- * E-mail: (CC); (IC)
| |
Collapse
|
45
|
Theys K, Lemey P, Vandamme AM, Baele G. Advances in Visualization Tools for Phylogenomic and Phylodynamic Studies of Viral Diseases. Front Public Health 2019; 7:208. [PMID: 31428595 PMCID: PMC6688121 DOI: 10.3389/fpubh.2019.00208] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Accepted: 07/12/2019] [Indexed: 01/28/2023] Open
Abstract
Genomic and epidemiological monitoring have become an integral part of our response to emerging and ongoing epidemics of viral infectious diseases. Advances in high-throughput sequencing, including portable genomic sequencing at reduced costs and turnaround time, are paralleled by continuing developments in methodology to infer evolutionary histories (dynamics/patterns) and to identify factors driving viral spread in space and time. The traditionally static nature of visualizing phylogenetic trees that represent these evolutionary relationships/processes has also evolved, albeit perhaps at a slower rate. Advanced visualization tools with increased resolution assist in drawing conclusions from phylogenetic estimates and may even have potential to better inform public health and treatment decisions, but the design (and choice of what analyses are shown) is hindered by the complexity of information embedded within current phylogenetic models and the integration of available meta-data. In this review, we discuss visualization challenges for the interpretation and exploration of reconstructed histories of viral epidemics that arose from increasing volumes of sequence data and the wealth of additional data layers that can be integrated. We focus on solutions that address joint temporal and spatial visualization but also consider what the future may bring in terms of visualization and how this may become of value for the coming era of real-time digital pathogen surveillance, where actionable results and adequate intervention strategies need to be obtained within days.
Collapse
Affiliation(s)
- Kristof Theys
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Clinical and Epidemiological Virology, KU Leuven, Leuven, Belgium
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Clinical and Epidemiological Virology, KU Leuven, Leuven, Belgium
| | - Anne-Mieke Vandamme
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Clinical and Epidemiological Virology, KU Leuven, Leuven, Belgium
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Clinical and Epidemiological Virology, KU Leuven, Leuven, Belgium
| |
Collapse
|
46
|
Abstract
One approach to the reconstruction of infectious disease transmission trees from pathogen genomic data has been to use a phylogenetic tree, reconstructed from pathogen sequences, and annotate its internal nodes to provide a reconstruction of which host each lineage was in at each point in time. If only one pathogen lineage can be transmitted to a new host (i.e., the transmission bottleneck is complete), this corresponds to partitioning the nodes of the phylogeny into connected regions, each of which represents evolution in an individual host. These partitions define the possible transmission trees that are consistent with a given phylogenetic tree. However, the mathematical properties of the transmission trees given a phylogeny remain largely unexplored. Here, we describe a procedure to calculate the number of possible transmission trees for a given phylogeny, and we then show how to uniformly sample from these transmission trees. The procedure is outlined for situations where one sample is available from each host and trees do not have branch lengths, and we also provide extensions for incomplete sampling, multiple sampling, and the application to time trees in a situation where limits on the period during which each host could have been infected and infectious are known. The sampling algorithm is available as an R package (STraTUS).
Collapse
Affiliation(s)
- Matthew D Hall
- Nuffield Department of Medicine, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
47
|
Besser JM, Carleton HA, Trees E, Stroika SG, Hise K, Wise M, Gerner-Smidt P. Interpretation of Whole-Genome Sequencing for Enteric Disease Surveillance and Outbreak Investigation. Foodborne Pathog Dis 2019; 16:504-512. [PMID: 31246502 PMCID: PMC6653782 DOI: 10.1089/fpd.2019.2650] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The routine use of whole-genome sequencing (WGS) as part of enteric disease surveillance is substantially enhancing our ability to detect and investigate outbreaks and to monitor disease trends. At the same time, it is revealing as never before the vast complexity of microbial and human interactions that contribute to outbreak ecology. Since WGS analysis is primarily used to characterize and compare microbial genomes with the goal of addressing epidemiological questions, it must be interpreted in an epidemiological context. In this article, we identify common challenges and pitfalls encountered when interpreting sequence data in an enteric disease surveillance and investigation context, and explain how to address them.
Collapse
Affiliation(s)
- John M Besser
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Diseases, Atlanta, Georgia
| | - Heather A Carleton
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Diseases, Atlanta, Georgia
| | - Eija Trees
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Diseases, Atlanta, Georgia
| | - Steven G Stroika
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Diseases, Atlanta, Georgia
| | - Kelley Hise
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Diseases, Atlanta, Georgia
| | - Matthew Wise
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Diseases, Atlanta, Georgia
| | - Peter Gerner-Smidt
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Diseases, Atlanta, Georgia
| |
Collapse
|
48
|
Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol 2019; 17:533-545. [DOI: 10.1038/s41579-019-0214-5] [Citation(s) in RCA: 155] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
49
|
Hayama Y, Firestone SM, Stevenson MA, Yamamoto T, Nishi T, Shimizu Y, Tsutsui T. Reconstructing a transmission network and identifying risk factors of secondary transmissions in the 2010 foot-and-mouth disease outbreak in Japan. Transbound Emerg Dis 2019; 66:2074-2086. [PMID: 31131968 DOI: 10.1111/tbed.13256] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Revised: 05/17/2019] [Accepted: 05/17/2019] [Indexed: 11/27/2022]
Abstract
Research aimed at understanding transmission networks, representing a network of "who infected whom" for an infectious disease outbreak, have been actively conducted in recent years. Transmission network models incorporating epidemiological and genetic data are valuable for elucidating disease transmission pathways. In this study, we reconstructed the transmission network of the foot-and-mouth disease (FMD) epidemic in Japan in 2010, and explored farm-level risk factors associated with increased risk of secondary transmission. A published, systematic Bayesian transmission network model was applied to epidemiological data of 292 infected farms and whole genome sequence data of 104 of the infected farms. This model can make inferences for known infected farms even lacking genetic data. After estimating the consensus network, the accuracy of the network was examined by comparison with epidemiological data. Then, risk factors inferred to have been sources of secondary transmission were explored using zero-inflated Poisson regression model. As far as we are aware, this study represents the largest FMD outbreak transmission network to be published by such means combining epidemiological and genetic data. The consensus network reasonably generated the epidemiological links, which were estimated from the actual epidemiological investigation. Among 292 farms, 101 farms (35%) were inferred to have been the sources of secondary transmission, and amongst these farms, the median number of secondary cases was 2 (min:1-max:18) farms. The farm-type (small and large -sized pig farms), the number of days from onset to notification, and the number of susceptible farms within a 1-km radius were significantly associated with secondary transmission. Transmission network modelling enabled inference of the connections between infected farms during the FMD epidemic and identified important factors for controlling the risk of secondary transmission. This study demonstrated that the predominant susceptible species held on a farm, farm size, and animal density were associated with increased onwards transmission.
Collapse
Affiliation(s)
- Yoko Hayama
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Japan
| | - Simon M Firestone
- Faculty of Veterinary and Agricultural Sciences, Melbourne Veterinary School, Asia-Pacific Centre for Animal Health, The University of Melbourne, Parkville, Victoria, Australia
| | - Mark A Stevenson
- Faculty of Veterinary and Agricultural Sciences, Melbourne Veterinary School, Asia-Pacific Centre for Animal Health, The University of Melbourne, Parkville, Victoria, Australia
| | - Takehisa Yamamoto
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Japan
| | - Tatsuya Nishi
- Exotic Disease Research Station, National Institute of Animal Health, National Agriculture and Food Research Organization, Kodaira, Japan
| | - Yumiko Shimizu
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Japan
| | - Toshiyuki Tsutsui
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Japan
| |
Collapse
|
50
|
Lewnard JA, Reingold AL. Emerging Challenges and Opportunities in Infectious Disease Epidemiology. Am J Epidemiol 2019; 188:873-882. [PMID: 30877295 PMCID: PMC7109842 DOI: 10.1093/aje/kwy264] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 11/28/2018] [Accepted: 11/29/2018] [Indexed: 12/12/2022] Open
Abstract
Much of the intellectual tradition of modern epidemiology stems from efforts to understand and combat chronic diseases persisting through the 20th century epidemiologic transition of countries such as the United States and United Kingdom. After decades of relative obscurity, infectious disease epidemiology has undergone an intellectual rebirth in recent years amid increasing recognition of the threat posed by both new and familiar pathogens. Here, we review the emerging coalescence of infectious disease epidemiology around a core set of study designs and statistical methods bearing little resemblance to the chronic disease epidemiology toolkit. We offer our outlook on challenges and opportunities facing the field, including the integration of novel molecular and digital information sources into disease surveillance, the assimilation of such data into models of pathogen spread, and the increasing contribution of models to public health practice. We next consider emerging paradigms in causal inference for infectious diseases, ranging from approaches to evaluating vaccines and antimicrobial therapies to the task of ascribing clinical syndromes to etiologic microorganisms, an age-old problem transformed by our increasing ability to characterize human-associated microbiota. These areas represent an increasingly important component of epidemiology training programs for future generations of researchers and practitioners.
Collapse
Affiliation(s)
- Joseph A Lewnard
- Division of Epidemiology and Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, California
- Correspondence to Dr. Joseph A. Lewnard, Division of Epidemiology and Biostatistics, School of Public Health, University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA 94720 (e-mail: )
| | - Arthur L Reingold
- Division of Epidemiology and Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, California
| |
Collapse
|