1
|
Judge C, Vaughan T, Russell T, Abbott S, du Plessis L, Stadler T, Brady O, Hill S. EpiFusion: Joint inference of the effective reproduction number by integrating phylodynamic and epidemiological modelling with particle filtering. PLoS Comput Biol 2024; 20:e1012528. [PMID: 39527637 PMCID: PMC11581393 DOI: 10.1371/journal.pcbi.1012528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 11/21/2024] [Accepted: 10/01/2024] [Indexed: 11/16/2024] Open
Abstract
Accurately estimating the effective reproduction number (Rt) of a circulating pathogen is a fundamental challenge in the study of infectious disease. The fields of epidemiology and pathogen phylodynamics both share this goal, but to date, methodologies and data employed by each remain largely distinct. Here we present EpiFusion: a joint approach that can be used to harness the complementary strengths of each field to improve estimation of outbreak dynamics for large and poorly sampled epidemics, such as arboviral or respiratory virus outbreaks, and validate it for retrospective analysis. We propose a model of Rt that estimates outbreak trajectories conditional upon both phylodynamic (time-scaled trees estimated from genetic sequences) and epidemiological (case incidence) data. We simulate stochastic outbreak trajectories that are weighted according to epidemiological and phylodynamic observation models and fit using particle Markov Chain Monte Carlo. To assess performance, we test EpiFusion on simulated outbreaks in which transmission and/or surveillance rapidly changes and find that using EpiFusion to combine epidemiological and phylodynamic data maintains accuracy and increases certainty in trajectory and Rt estimates, compared to when each data type is used alone. We benchmark EpiFusion's performance against existing methods to estimate Rt and demonstrate advances in speed and accuracy. Importantly, our approach scales efficiently with dataset size. Finally, we apply our model to estimate Rt during the 2014 Ebola outbreak in Sierra Leone. EpiFusion is designed to accommodate future extensions that will improve its utility, such as explicitly modelling population structure, accommodations for phylogenetic uncertainty, and the ability to weight the contributions of genomic or case incidence to the inference.
Collapse
Affiliation(s)
- Ciara Judge
- Department of Infectious Disease Epidemiology and Dynamics, Faculty of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, United Kingdom
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, United Kingdom
- Department of Pathobiology and Population Sciences, Royal Veterinary College, United Kingdom
| | - Timothy Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Timothy Russell
- Department of Infectious Disease Epidemiology and Dynamics, Faculty of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, United Kingdom
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, United Kingdom
| | - Sam Abbott
- Department of Infectious Disease Epidemiology and Dynamics, Faculty of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, United Kingdom
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, United Kingdom
| | - Louis du Plessis
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Oliver Brady
- Department of Infectious Disease Epidemiology and Dynamics, Faculty of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, United Kingdom
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, United Kingdom
| | - Sarah Hill
- Department of Pathobiology and Population Sciences, Royal Veterinary College, United Kingdom
| |
Collapse
|
2
|
Meng L, Huo Z. Outcome-guided Bayesian clustering for disease subtype discovery using high-dimensional transcriptomic data. J Appl Stat 2024; 52:183-207. [PMID: 39811087 PMCID: PMC11727188 DOI: 10.1080/02664763.2024.2362275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 05/23/2024] [Indexed: 01/16/2025]
Abstract
Due to the tremendous heterogeneity of disease manifestations, many complex diseases that were once thought to be single diseases are now considered to have disease subtypes. Disease subtyping analysis, that is the identification of subgroups of patients with similar characteristics, is the first step to accomplish precision medicine. With the advancement of high-throughput technologies, omics data offers unprecedented opportunity to reveal disease subtypes. As a result, unsupervised clustering analysis has been widely used for this purpose. Though promising, the subtypes obtained from traditional quantitative approaches may not always be clinically meaningful (i.e. correlate with clinical outcomes). On the other hand, the collection of rich clinical data in modern epidemiology studies has the great potential to facilitate the disease subtyping process via omics data and to discovery clinically meaningful disease subtypes. Thus, we developed an outcome-guided Bayesian clustering (GuidedBayesianClustering) method to fully integrate the clinical data and the high-dimensional omics data. A Gaussian mixed model framework was applied to perform sample clustering; a spike-and-slab prior was utilized to perform gene selection; a mixture model prior was employed to incorporate the guidance from a clinical outcome variable; and a decision framework was adopted to infer the false discovery rate of the selected genes. We deployed conjugate priors to facilitate efficient Gibbs sampling. Our proposed full Bayesian method is capable of simultaneously (i) obtaining sample clustering (disease subtype discovery); (ii) performing feature selection (select genes related to the disease subtype); and (iii) utilizing clinical outcome variable to guide the disease subtype discovery. The superior performance of the GuidedBayesianClustering was demonstrated through simulations and applications of breast cancer expression data and Alzheimer's disease. An R package has been made publicly available on GitHub to improve the applicability of our method.
Collapse
Affiliation(s)
- Lingsong Meng
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| | - Zhiguang Huo
- Department of Biostatistics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
3
|
Inoue H, Hukushima K, Omori T. Estimating Distributions of Parameters in Nonlinear State Space Models with Replica Exchange Particle Marginal Metropolis–Hastings Method. ENTROPY 2022; 24:e24010115. [PMID: 35052141 PMCID: PMC8774595 DOI: 10.3390/e24010115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/29/2021] [Accepted: 01/07/2022] [Indexed: 02/04/2023]
Abstract
Extracting latent nonlinear dynamics from observed time-series data is important for understanding a dynamic system against the background of the observed data. A state space model is a probabilistic graphical model for time-series data, which describes the probabilistic dependence between latent variables at subsequent times and between latent variables and observations. Since, in many situations, the values of the parameters in the state space model are unknown, estimating the parameters from observations is an important task. The particle marginal Metropolis–Hastings (PMMH) method is a method for estimating the marginal posterior distribution of parameters obtained by marginalization over the distribution of latent variables in the state space model. Although, in principle, we can estimate the marginal posterior distribution of parameters by iterating this method infinitely, the estimated result depends on the initial values for a finite number of times in practice. In this paper, we propose a replica exchange particle marginal Metropolis–Hastings (REPMMH) method as a method to improve this problem by combining the PMMH method with the replica exchange method. By using the proposed method, we simultaneously realize a global search at a high temperature and a local fine search at a low temperature. We evaluate the proposed method using simulated data obtained from the Izhikevich neuron model and Lévy-driven stochastic volatility model, and we show that the proposed REPMMH method improves the problem of the initial value dependence in the PMMH method, and realizes efficient sampling of parameters in the state space models compared with existing methods.
Collapse
Affiliation(s)
- Hiroaki Inoue
- Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan;
| | - Koji Hukushima
- Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan;
- Komaba Institute for Science, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan
| | - Toshiaki Omori
- Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan;
- Organization for Advanced and Integrated Research, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
- Center for Mathematical and Data Sciences, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
- Correspondence:
| |
Collapse
|
4
|
Meng L, Avram D, Tseng G, Huo Z. Outcome‐guided sparse K‐means for disease subtype discovery via integrating phenotypic data with high‐dimensional transcriptomic data. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Lingsong Meng
- Department of Biostatistics University of Florida Gainesville USA
| | - Dorina Avram
- Department of Immunology H. Lee Moffitt Cancer Center and Research Institute Tampa USA
| | - George Tseng
- Department of Biostatistics University of Pittsburgh Pittsburgh USA
| | - Zhiguang Huo
- Department of Biostatistics University of Florida Gainesville USA
| |
Collapse
|