1
|
Truszkowski J, Perrigo A, Broman D, Ronquist F, Antonelli A. Online tree expansion could help solve the problem of scalability in Bayesian phylogenetics. Syst Biol 2023; 72:1199-1206. [PMID: 37498209 PMCID: PMC10627553 DOI: 10.1093/sysbio/syad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 06/22/2023] [Accepted: 07/11/2023] [Indexed: 07/28/2023] Open
Abstract
Bayesian phylogenetics is now facing a critical point. Over the last 20 years, Bayesian methods have reshaped phylogenetic inference and gained widespread popularity due to their high accuracy, the ability to quantify the uncertainty of inferences and the possibility of accommodating multiple aspects of evolutionary processes in the models that are used. Unfortunately, Bayesian methods are computationally expensive, and typical applications involve at most a few hundred sequences. This is problematic in the age of rapidly expanding genomic data and increasing scope of evolutionary analyses, forcing researchers to resort to less accurate but faster methods, such as maximum parsimony and maximum likelihood. Does this spell doom for Bayesian methods? Not necessarily. Here, we discuss some recently proposed approaches that could help scale up Bayesian analyses of evolutionary problems considerably. We focus on two particular aspects: online phylogenetics, where new data sequences are added to existing analyses, and alternatives to Markov chain Monte Carlo (MCMC) for scalable Bayesian inference. We identify 5 specific challenges and discuss how they might be overcome. We believe that online phylogenetic approaches and Sequential Monte Carlo hold great promise and could potentially speed up tree inference by orders of magnitude. We call for collaborative efforts to speed up the development of methods for real-time tree expansion through online phylogenetics.
Collapse
Affiliation(s)
- Jakub Truszkowski
- Department of Biological and Environmental Sciences, University of Gothenburg, P. O. Box 461, SE.405 30 Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, 405 30 Gothenburg, Sweden
| | - Allison Perrigo
- Department of Biological and Environmental Sciences, University of Gothenburg, P. O. Box 461, SE.405 30 Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, 405 30 Gothenburg, Sweden
| | - David Broman
- Department of Computer Science and Digital Futures, KTH Royal Institute of Technology, SE.100 44 Stockholm, Sweden
| | - Fredrik Ronquist
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, P. O. Box 50007, SE.104 05 Stockholm, Sweden
| | - Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, P. O. Box 461, SE.405 30 Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, 405 30 Gothenburg, Sweden
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3 RB, UK
| |
Collapse
|
2
|
Wang S, Ge S, Sobkowiak B, Wang L, Grandjean L, Colijn C, Elliott LT. Genome-Wide Association with Uncertainty in the Genetic Similarity Matrix. J Comput Biol 2023; 30:189-203. [PMID: 36374242 DOI: 10.1089/cmb.2022.0067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Genome-wide association studies (GWASs) are often confounded by population stratification and structure. Linear mixed models (LMMs) are a powerful class of methods for uncovering genetic effects, while controlling for such confounding. LMMs include random effects for a genetic similarity matrix, and they assume that a true genetic similarity matrix is known. However, uncertainty about the phylogenetic structure of a study population may degrade the quality of LMM results. This may happen in bacterial studies in which the number of samples or loci is small, or in studies with low-quality genotyping. In this study, we develop methods for linear mixed models in which the genetic similarity matrix is unknown and is derived from Markov chain Monte Carlo estimates of the phylogeny. We apply our model to a GWAS of multidrug resistance in tuberculosis, and illustrate our methods on simulated data.
Collapse
Affiliation(s)
- Shijia Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Shufei Ge
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai, China
| | | | - Liangliang Wang
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| | - Louis Grandjean
- Department of Infectious Diseases, University College London, London, United Kingdom
| | - Caroline Colijn
- Department of Mathematics and Simon Fraser University, Burnaby, Canada
| | - Lloyd T Elliott
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
3
|
Fisher AA, Hassler GW, Ji X, Baele G, Suchard MA, Lemey P. Scalable Bayesian phylogenetics. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210242. [PMID: 35989603 PMCID: PMC9393558 DOI: 10.1098/rstb.2021.0242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 04/20/2022] [Indexed: 02/01/2023] Open
Abstract
Recent advances in Bayesian phylogenetics offer substantial computational savings to accommodate increased genomic sampling that challenges traditional inference methods. In this review, we begin with a brief summary of the Bayesian phylogenetic framework, and then conceptualize a variety of methods to improve posterior approximations via Markov chain Monte Carlo (MCMC) sampling. Specifically, we discuss methods to improve the speed of likelihood calculations, reduce MCMC burn-in, and generate better MCMC proposals. We apply several of these techniques to study the evolution of HIV virulence along a 1536-tip phylogeny and estimate the internal node heights of a 1000-tip SARS-CoV-2 phylogenetic tree in order to illustrate the speed-up of such analyses using current state-of-the-art approaches. We conclude our review with a discussion of promising alternatives to MCMC that approximate the phylogenetic posterior. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
| | - Gabriel W. Hassler
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA
| | - Xiang Ji
- Department of Mathematics, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, 3000 Leuven, Belgium
| | - Marc A. Suchard
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
4
|
Inoue H, Hukushima K, Omori T. Estimating Distributions of Parameters in Nonlinear State Space Models with Replica Exchange Particle Marginal Metropolis–Hastings Method. ENTROPY 2022; 24:e24010115. [PMID: 35052141 PMCID: PMC8774595 DOI: 10.3390/e24010115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 12/29/2021] [Accepted: 01/07/2022] [Indexed: 02/04/2023]
Abstract
Extracting latent nonlinear dynamics from observed time-series data is important for understanding a dynamic system against the background of the observed data. A state space model is a probabilistic graphical model for time-series data, which describes the probabilistic dependence between latent variables at subsequent times and between latent variables and observations. Since, in many situations, the values of the parameters in the state space model are unknown, estimating the parameters from observations is an important task. The particle marginal Metropolis–Hastings (PMMH) method is a method for estimating the marginal posterior distribution of parameters obtained by marginalization over the distribution of latent variables in the state space model. Although, in principle, we can estimate the marginal posterior distribution of parameters by iterating this method infinitely, the estimated result depends on the initial values for a finite number of times in practice. In this paper, we propose a replica exchange particle marginal Metropolis–Hastings (REPMMH) method as a method to improve this problem by combining the PMMH method with the replica exchange method. By using the proposed method, we simultaneously realize a global search at a high temperature and a local fine search at a low temperature. We evaluate the proposed method using simulated data obtained from the Izhikevich neuron model and Lévy-driven stochastic volatility model, and we show that the proposed REPMMH method improves the problem of the initial value dependence in the PMMH method, and realizes efficient sampling of parameters in the state space models compared with existing methods.
Collapse
Affiliation(s)
- Hiroaki Inoue
- Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan;
| | - Koji Hukushima
- Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan;
- Komaba Institute for Science, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan
| | - Toshiaki Omori
- Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan;
- Organization for Advanced and Integrated Research, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
- Center for Mathematical and Data Sciences, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
- Correspondence:
| |
Collapse
|