1
|
Alamoudi E, Reck F, Bundgaard N, Graw F, Brusch L, Hasenauer J, Schälte Y. A wall-time minimizing parallelization strategy for approximate Bayesian computation. PLoS One 2024; 19:e0294015. [PMID: 38386671 PMCID: PMC10883530 DOI: 10.1371/journal.pone.0294015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/24/2023] [Indexed: 02/24/2024] Open
Abstract
Approximate Bayesian Computation (ABC) is a widely applicable and popular approach to estimating unknown parameters of mechanistic models. As ABC analyses are computationally expensive, parallelization on high-performance infrastructure is often necessary. However, the existing parallelization strategies leave computing resources unused at times and thus do not optimally leverage them yet. We present look-ahead scheduling, a wall-time minimizing parallelization strategy for ABC Sequential Monte Carlo algorithms, which avoids idle times of computing units by preemptive sampling of subsequent generations. This allows to utilize all available resources. The strategy can be integrated with e.g. adaptive distance function and summary statistic selection schemes, which is essential in practice. Our key contribution is the theoretical assessment of the strategy of preemptive sampling and the proof of unbiasedness. Complementary, we provide an implementation and evaluate the strategy on different problems and numbers of parallel cores, showing speed-ups of typically 10-20% and up to 50% compared to the best established approach, with some variability. Thus, the proposed strategy allows to improve the cost and run-time efficiency of ABC methods on high-performance infrastructure.
Collapse
Affiliation(s)
- Emad Alamoudi
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| | - Felipe Reck
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| | - Nils Bundgaard
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg, Germany
| | - Frederik Graw
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg, Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg, Germany
- Department of Medicine 5, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Lutz Brusch
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, Germany
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Yannik Schälte
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| |
Collapse
|
2
|
Schälte Y, Hasenauer J. Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation. PLoS One 2023; 18:e0285836. [PMID: 37216372 DOI: 10.1371/journal.pone.0285836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 05/02/2023] [Indexed: 05/24/2023] Open
Abstract
Calibrating model parameters on heterogeneous data can be challenging and inefficient. This holds especially for likelihood-free methods such as approximate Bayesian computation (ABC), which rely on the comparison of relevant features in simulated and observed data and are popular for otherwise intractable problems. To address this problem, methods have been developed to scale-normalize data, and to derive informative low-dimensional summary statistics using inverse regression models of parameters on data. However, while approaches only correcting for scale can be inefficient on partly uninformative data, the use of summary statistics can lead to information loss and relies on the accuracy of employed methods. In this work, we first show that the combination of adaptive scale normalization with regression-based summary statistics is advantageous on heterogeneous parameter scales. Second, we present an approach employing regression models not to transform data, but to inform sensitivity weights quantifying data informativeness. Third, we discuss problems for regression models under non-identifiability, and present a solution using target augmentation. We demonstrate improved accuracy and efficiency of the presented approach on various problems, in particular robustness and wide applicability of the sensitivity weights. Our findings demonstrate the potential of the adaptive approach. The developed algorithms have been made available in the open-source Python toolbox pyABC.
Collapse
Affiliation(s)
- Yannik Schälte
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Jan Hasenauer
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| |
Collapse
|
3
|
Avecilla G, Chuong JN, Li F, Sherlock G, Gresham D, Ram Y. Neural networks enable efficient and accurate simulation-based inference of evolutionary parameters from adaptation dynamics. PLoS Biol 2022; 20:e3001633. [PMID: 35622868 PMCID: PMC9140244 DOI: 10.1371/journal.pbio.3001633] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 04/14/2022] [Indexed: 11/24/2022] Open
Abstract
The rate of adaptive evolution depends on the rate at which beneficial mutations are introduced into a population and the fitness effects of those mutations. The rate of beneficial mutations and their expected fitness effects is often difficult to empirically quantify. As these 2 parameters determine the pace of evolutionary change in a population, the dynamics of adaptive evolution may enable inference of their values. Copy number variants (CNVs) are a pervasive source of heritable variation that can facilitate rapid adaptive evolution. Previously, we developed a locus-specific fluorescent CNV reporter to quantify CNV dynamics in evolving populations maintained in nutrient-limiting conditions using chemostats. Here, we use CNV adaptation dynamics to estimate the rate at which beneficial CNVs are introduced through de novo mutation and their fitness effects using simulation-based likelihood-free inference approaches. We tested the suitability of 2 evolutionary models: a standard Wright-Fisher model and a chemostat model. We evaluated 2 likelihood-free inference algorithms: the well-established Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) algorithm, and the recently developed Neural Posterior Estimation (NPE) algorithm, which applies an artificial neural network to directly estimate the posterior distribution. By systematically evaluating the suitability of different inference methods and models, we show that NPE has several advantages over ABC-SMC and that a Wright-Fisher evolutionary model suffices in most cases. Using our validated inference framework, we estimate the CNV formation rate at the GAP1 locus in the yeast Saccharomyces cerevisiae to be 10-4.7 to 10-4 CNVs per cell division and a fitness coefficient of 0.04 to 0.1 per generation for GAP1 CNVs in glutamine-limited chemostats. We experimentally validated our inference-based estimates using 2 distinct experimental methods-barcode lineage tracking and pairwise fitness assays-which provide independent confirmation of the accuracy of our approach. Our results are consistent with a beneficial CNV supply rate that is 10-fold greater than the estimated rates of beneficial single-nucleotide mutations, explaining the outsized importance of CNVs in rapid adaptive evolution. More generally, our study demonstrates the utility of novel neural network-based likelihood-free inference methods for inferring the rates and effects of evolutionary processes from empirical data with possible applications ranging from tumor to viral evolution.
Collapse
Affiliation(s)
- Grace Avecilla
- Department of Biology, New York University, New York, New York, United States of America
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Julie N. Chuong
- Department of Biology, New York University, New York, New York, United States of America
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Fangfei Li
- Department of Genetics, Stanford University, California, Stanford, United States of America
| | - Gavin Sherlock
- Department of Genetics, Stanford University, California, Stanford, United States of America
| | - David Gresham
- Department of Biology, New York University, New York, New York, United States of America
- Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Yoav Ram
- School of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
4
|
Raynal L, Marin JM, Pudlo P, Ribatet M, Robert CP, Estoup A. ABC random forests for Bayesian parameter inference. Bioinformatics 2018; 35:1720-1728. [DOI: 10.1093/bioinformatics/bty867] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 09/17/2018] [Accepted: 10/12/2018] [Indexed: 01/07/2023] Open
Affiliation(s)
- Louis Raynal
- IMAG, Univ Montpellier, CNRS, Montpellier, France
| | - Jean-Michel Marin
- IMAG, Univ Montpellier, CNRS, Montpellier, France
- IBC, Univ Montpellier, CNRS, Montpellier, France
| | - Pierre Pudlo
- Institut de Mathématiques de Marseille, Aix-Marseille Université, Marseille, France
| | | | - Christian P Robert
- Université Paris Dauphine, PSL Research University, Paris, France
- Department of Statistics, University of Warwick, Coventry, UK
| | - Arnaud Estoup
- IBC, Univ Montpellier, CNRS, Montpellier, France
- CBGP, INRA, CIRAD, IRD, Montpellier SupAgro, Univ Montpellier, Montpellier, France
| |
Collapse
|