1
|
Rimella L, Jewell C, Fearnhead P. Simulation based composite likelihood. STATISTICS AND COMPUTING 2025; 35:58. [PMID: 40017662 PMCID: PMC11861035 DOI: 10.1007/s11222-025-10584-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 02/06/2025] [Indexed: 03/01/2025]
Abstract
Inference for high-dimensional hidden Markov models is challenging due to the exponential-in-dimension computational cost of calculating the likelihood. To address this issue, we introduce an innovative composite likelihood approach called "Simulation Based Composite Likelihood" (SimBa-CL). With SimBa-CL, we approximate the likelihood by the product of its marginals, which we estimate using Monte Carlo sampling. In a similar vein to approximate Bayesian computation (ABC), SimBa-CL requires multiple simulations from the model, but, in contrast to ABC, it provides a likelihood approximation that guides the optimization of the parameters. Leveraging automatic differentiation libraries, it is simple to calculate gradients and Hessians to not only speed up optimization but also to build approximate confidence sets. We present extensive empirical results which validate our theory and demonstrate its advantage over SMC, and apply SimBa-CL to real-world Aphtovirus data. Supplementary Information The online version contains supplementary material available at 10.1007/s11222-025-10584-z.
Collapse
Affiliation(s)
- Lorenzo Rimella
- ESOMAS, University of Turin, Via Verdi 8, 10124 Turin, Italy
- Statistics Initiative, Collegio Carlo Alberto, Piazza Arbarello 8, 10122 Turin, Italy
| | - Chris Jewell
- Mathematical Sciences, Lancaster University, Lancaster, LA14YF UK
| | - Paul Fearnhead
- Mathematical Sciences, Lancaster University, Lancaster, LA14YF UK
| |
Collapse
|
2
|
Jallais M, Palombo M. Introducing µGUIDE for quantitative imaging via generalized uncertainty-driven inference using deep learning. eLife 2024; 13:RP101069. [PMID: 39589260 PMCID: PMC11594529 DOI: 10.7554/elife.101069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024] Open
Abstract
This work proposes µGUIDE: a general Bayesian framework to estimate posterior distributions of tissue microstructure parameters from any given biophysical model or signal representation, with exemplar demonstration in diffusion-weighted magnetic resonance imaging. Harnessing a new deep learning architecture for automatic signal feature selection combined with simulation-based inference and efficient sampling of the posterior distributions, µGUIDE bypasses the high computational and time cost of conventional Bayesian approaches and does not rely on acquisition constraints to define model-specific summary statistics. The obtained posterior distributions allow to highlight degeneracies present in the model definition and quantify the uncertainty and ambiguity of the estimated parameters.
Collapse
Affiliation(s)
- Maëliss Jallais
- Cardiff University Brain Research Imaging Centre (CUBRIC), Cardiff UniversityCardiffUnited Kingdom
- School of Computer Science and Informatics, Cardiff UniversityCardiffUnited Kingdom
| | - Marco Palombo
- Cardiff University Brain Research Imaging Centre (CUBRIC), Cardiff UniversityCardiffUnited Kingdom
- School of Computer Science and Informatics, Cardiff UniversityCardiffUnited Kingdom
| |
Collapse
|
3
|
Rmus M, Pan TF, Xia L, Collins AGE. Artificial neural networks for model identification and parameter estimation in computational cognitive models. PLoS Comput Biol 2024; 20:e1012119. [PMID: 38748770 PMCID: PMC11132492 DOI: 10.1371/journal.pcbi.1012119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 05/28/2024] [Accepted: 04/27/2024] [Indexed: 05/28/2024] Open
Abstract
Computational cognitive models have been used extensively to formalize cognitive processes. Model parameters offer a simple way to quantify individual differences in how humans process information. Similarly, model comparison allows researchers to identify which theories, embedded in different models, provide the best accounts of the data. Cognitive modeling uses statistical tools to quantitatively relate models to data that often rely on computing/estimating the likelihood of the data under the model. However, this likelihood is computationally intractable for a substantial number of models. These relevant models may embody reasonable theories of cognition, but are often under-explored due to the limited range of tools available to relate them to data. We contribute to filling this gap in a simple way using artificial neural networks (ANNs) to map data directly onto model identity and parameters, bypassing the likelihood estimation. We test our instantiation of an ANN as a cognitive model fitting tool on classes of cognitive models with strong inter-trial dependencies (such as reinforcement learning models), which offer unique challenges to most methods. We show that we can adequately perform both parameter estimation and model identification using our ANN approach, including for models that cannot be fit using traditional likelihood-based methods. We further discuss our work in the context of the ongoing research leveraging simulation-based approaches to parameter estimation and model identification, and how these approaches broaden the class of cognitive models researchers can quantitatively investigate.
Collapse
Affiliation(s)
- Milena Rmus
- Department of Psychology, University of California, Berkeley, Berkeley, California, United States of America
| | - Ti-Fen Pan
- Department of Psychology, University of California, Berkeley, Berkeley, California, United States of America
| | - Liyu Xia
- Department of Mathematics, University of California, Berkeley, Berkeley, California, United States of America
| | - Anne G. E. Collins
- Department of Psychology, University of California, Berkeley, Berkeley, California, United States of America
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, California, United States of America
| |
Collapse
|
4
|
Saubin M, Tellier A, Stoeckel S, Andrieux A, Halkett F. Approximate Bayesian Computation applied to time series of population genetic data disentangles rapid genetic changes and demographic variations in a pathogen population. Mol Ecol 2024; 33:e16965. [PMID: 37150947 DOI: 10.1111/mec.16965] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 04/04/2023] [Accepted: 04/12/2023] [Indexed: 05/09/2023]
Abstract
Adaptation can occur at remarkably short timescales in natural populations, leading to drastic changes in phenotypes and genotype frequencies over a few generations only. The inference of demographic parameters can allow understanding how evolutionary forces interact and shape the genetic trajectories of populations during rapid adaptation. Here we propose a new Approximate Bayesian Computation (ABC) framework that couples a forward and individual-based model with temporal genetic data to disentangle genetic changes and demographic variations in a case of rapid adaptation. We test the accuracy of our inferential framework and evaluate the benefit of considering a dense versus sparse sampling. Theoretical investigations demonstrate high accuracy in both model and parameter estimations, even if a strong thinning is applied to time series data. Then, we apply our ABC inferential framework to empirical data describing the population genetic changes of the poplar rust pathogen following a major event of resistance overcoming. We successfully estimate key demographic and genetic parameters, including the proportion of resistant hosts deployed in the landscape and the level of standing genetic variation from which selection occurred. Inferred values are in accordance with our empirical knowledge of this biological system. This new inferential framework, which contrasts with coalescent-based ABC analyses, is promising for a better understanding of evolutionary trajectories of populations subjected to rapid adaptation.
Collapse
Affiliation(s)
- Méline Saubin
- Université de Lorraine, INRAE, IAM, Nancy, France
- Department for Life Science Systems, Technical University of Munich, Freising, Germany
| | - Aurélien Tellier
- Department for Life Science Systems, Technical University of Munich, Freising, Germany
| | - Solenn Stoeckel
- INRAE, Agrocampus Ouest, Université de Rennes, IGEPP, Le Rheu, France
| | | | | |
Collapse
|
5
|
Smiley O, Hoffmann T, Onnela JP. Approximate inference for longitudinal mechanistic HIV contact network. APPLIED NETWORK SCIENCE 2024; 9:12. [PMID: 38699247 PMCID: PMC11060975 DOI: 10.1007/s41109-024-00616-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 04/06/2024] [Indexed: 05/05/2024]
Abstract
Network models are increasingly used to study infectious disease spread. Exponential Random Graph models have a history in this area, with scalable inference methods now available. An alternative approach uses mechanistic network models. Mechanistic network models directly capture individual behaviors, making them suitable for studying sexually transmitted diseases. Combining mechanistic models with Approximate Bayesian Computation allows flexible modeling using domain-specific interaction rules among agents, avoiding network model oversimplifications. These models are ideal for longitudinal settings as they explicitly incorporate network evolution over time. We implemented a discrete-time version of a previously published continuous-time model of evolving contact networks for men who have sex with men and proposed an ABC-based approximate inference scheme for it. As expected, we found that a two-wave longitudinal study design improves the accuracy of inference compared to a cross-sectional design. However, the gains in precision in collecting data twice, up to 18%, depend on the spacing of the two waves and are sensitive to the choice of summary statistics. In addition to methodological developments, our results inform the design of future longitudinal network studies in sexually transmitted diseases, specifically in terms of what data to collect from participants and when to do so.
Collapse
Affiliation(s)
- Octavious Smiley
- Biostatistics, Harvard University, 677 Huntington Ave, Boston, MA 02115 USA
| | - Till Hoffmann
- Biostatistics, Harvard University, 677 Huntington Ave, Boston, MA 02115 USA
| | - Jukka-Pekka Onnela
- Biostatistics, Harvard University, 677 Huntington Ave, Boston, MA 02115 USA
| |
Collapse
|
6
|
Wang MH, Onnela JP. Flexible Bayesian inference on partially observed epidemics. JOURNAL OF COMPLEX NETWORKS 2024; 12:cnae017. [PMID: 38533184 PMCID: PMC10962317 DOI: 10.1093/comnet/cnae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 03/02/2024] [Indexed: 03/28/2024]
Abstract
Individual-based models of contagious processes are useful for predicting epidemic trajectories and informing intervention strategies. In such models, the incorporation of contact network information can capture the non-randomness and heterogeneity of realistic contact dynamics. In this article, we consider Bayesian inference on the spreading parameters of an SIR contagion on a known, static network, where information regarding individual disease status is known only from a series of tests (positive or negative disease status). When the contagion model is complex or information such as infection and removal times is missing, the posterior distribution can be difficult to sample from. Previous work has considered the use of Approximate Bayesian Computation (ABC), which allows for simulation-based Bayesian inference on complex models. However, ABC methods usually require the user to select reasonable summary statistics. Here, we consider an inference scheme based on the Mixture Density Network compressed ABC, which minimizes the expected posterior entropy in order to learn informative summary statistics. This allows us to conduct Bayesian inference on the parameters of a partially observed contagious process while also circumventing the need for manual summary statistic selection. This methodology can be extended to incorporate additional simulation complexities, including behavioural change after positive tests or false test results.
Collapse
Affiliation(s)
- Maxwell H Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Jukka-Pekka Onnela
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| |
Collapse
|
7
|
Alamoudi E, Reck F, Bundgaard N, Graw F, Brusch L, Hasenauer J, Schälte Y. A wall-time minimizing parallelization strategy for approximate Bayesian computation. PLoS One 2024; 19:e0294015. [PMID: 38386671 PMCID: PMC10883530 DOI: 10.1371/journal.pone.0294015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/24/2023] [Indexed: 02/24/2024] Open
Abstract
Approximate Bayesian Computation (ABC) is a widely applicable and popular approach to estimating unknown parameters of mechanistic models. As ABC analyses are computationally expensive, parallelization on high-performance infrastructure is often necessary. However, the existing parallelization strategies leave computing resources unused at times and thus do not optimally leverage them yet. We present look-ahead scheduling, a wall-time minimizing parallelization strategy for ABC Sequential Monte Carlo algorithms, which avoids idle times of computing units by preemptive sampling of subsequent generations. This allows to utilize all available resources. The strategy can be integrated with e.g. adaptive distance function and summary statistic selection schemes, which is essential in practice. Our key contribution is the theoretical assessment of the strategy of preemptive sampling and the proof of unbiasedness. Complementary, we provide an implementation and evaluate the strategy on different problems and numbers of parallel cores, showing speed-ups of typically 10-20% and up to 50% compared to the best established approach, with some variability. Thus, the proposed strategy allows to improve the cost and run-time efficiency of ABC methods on high-performance infrastructure.
Collapse
Affiliation(s)
- Emad Alamoudi
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| | - Felipe Reck
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| | - Nils Bundgaard
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg, Germany
| | - Frederik Graw
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg, Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg, Germany
- Department of Medicine 5, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Lutz Brusch
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, Germany
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Yannik Schälte
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| |
Collapse
|
8
|
Alamoudi E, Schälte Y, Müller R, Starruß J, Bundgaard N, Graw F, Brusch L, Hasenauer J. FitMultiCell: simulating and parameterizing computational models of multi-scale and multi-cellular processes. Bioinformatics 2023; 39:btad674. [PMID: 37947308 PMCID: PMC10666203 DOI: 10.1093/bioinformatics/btad674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/12/2023] Open
Abstract
MOTIVATION Biological tissues are dynamic and highly organized. Multi-scale models are helpful tools to analyse and understand the processes determining tissue dynamics. These models usually depend on parameters that need to be inferred from experimental data to achieve a quantitative understanding, to predict the response to perturbations, and to evaluate competing hypotheses. However, even advanced inference approaches such as approximate Bayesian computation (ABC) are difficult to apply due to the computational complexity of the simulation of multi-scale models. Thus, there is a need for a scalable pipeline for modeling, simulating, and parameterizing multi-scale models of multi-cellular processes. RESULTS Here, we present FitMultiCell, a computationally efficient and user-friendly open-source pipeline that can handle the full workflow of modeling, simulating, and parameterizing for multi-scale models of multi-cellular processes. The pipeline is modular and integrates the modeling and simulation tool Morpheus and the statistical inference tool pyABC. The easy integration of high-performance infrastructure allows to scale to computationally expensive problems. The introduction of a novel standard for the formulation of parameter inference problems for multi-scale models additionally ensures reproducibility and reusability. By applying the pipeline to multiple biological problems, we demonstrate its broad applicability, which will benefit in particular image-based systems biology. AVAILABILITY AND IMPLEMENTATION FitMultiCell is available open-source at https://gitlab.com/fitmulticell/fit.
Collapse
Affiliation(s)
- Emad Alamoudi
- Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany
| | - Yannik Schälte
- Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg 85764, Germany
- Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Garching 85748, Germany
| | - Robert Müller
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany
| | - Jörn Starruß
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany
| | - Nils Bundgaard
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg 69120, Germany
| | - Frederik Graw
- BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg 69120, Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg 69120, Germany
- Department of Medicine 5, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen 91054, Germany
| | - Lutz Brusch
- Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany
- Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg 85764, Germany
- Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Garching 85748, Germany
| |
Collapse
|
9
|
Konstantinou K, Ghorbanpour F, Picchini U, Loavenbruck A, Särkkä A. Statistical modeling of diabetic neuropathy: Exploring the dynamics of nerve mortality. Stat Med 2023; 42:4128-4146. [PMID: 37485617 DOI: 10.1002/sim.9851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 06/01/2023] [Accepted: 07/13/2023] [Indexed: 07/25/2023]
Abstract
Diabetic neuropathy is a disorder characterized by impaired nerve function and reduction of the number of epidermal nerve fibers per epidermal surface. Additionally, as neuropathy related nerve fiber loss and regrowth progresses over time, the two-dimensional spatial arrangement of the nerves becomes more clustered. These observations suggest that with development of neuropathy, the spatial pattern of diminished skin innervation is defined by a thinning process which remains incompletely characterized. We regard samples obtained from healthy controls and subjects suffering from diabetic neuropathy as realisations of planar point processes consisting of nerve entry points and nerve endings, and propose point process models based on spatial thinning to describe the change as neuropathy advances. Initially, the hypothesis that the nerve removal occurs completely at random is tested using independent random thinning of healthy patterns. Then, a dependent parametric thinning model that favors the removal of isolated nerve trees is proposed. Approximate Bayesian computation is used to infer the distribution of the model parameters, and the goodness-of-fit of the models is evaluated using both non-spatial and spatial summary statistics. Our findings suggest that the nerve mortality process changes as neuropathy advances.
Collapse
Affiliation(s)
- Konstantinos Konstantinou
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden
- Department of Mathematical Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Farnaz Ghorbanpour
- Department of Mathematical Sciences, Allameh Tabataba'i University, Tehran, Iran
| | - Umberto Picchini
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden
- Department of Mathematical Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Adam Loavenbruck
- Department of Neurology, Kennedy Laboratory, University of Minnesota, Minneapolis, Minnesota, USA
| | - Aila Särkkä
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden
- Department of Mathematical Sciences, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
10
|
Paz-Linares D, Gonzalez-Moreira E, Areces-Gonzalez A, Wang Y, Li M, Martinez-Montes E, Bosch-Bayard J, Bringas-Vega ML, Valdes-Sosa M, Valdes-Sosa PA. Identifying oscillatory brain networks with hidden Gaussian graphical spectral models of MEEG. Sci Rep 2023; 13:11466. [PMID: 37454235 PMCID: PMC10349891 DOI: 10.1038/s41598-023-38513-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 07/11/2023] [Indexed: 07/18/2023] Open
Abstract
Identifying the functional networks underpinning indirectly observed processes poses an inverse problem for neurosciences or other fields. A solution of such inverse problems estimates as a first step the activity emerging within functional networks from EEG or MEG data. These EEG or MEG estimates are a direct reflection of functional brain network activity with a temporal resolution that no other in vivo neuroimage may provide. A second step estimating functional connectivity from such activity pseudodata unveil the oscillatory brain networks that strongly correlate with all cognition and behavior. Simulations of such MEG or EEG inverse problem also reveal estimation errors of the functional connectivity determined by any of the state-of-the-art inverse solutions. We disclose a significant cause of estimation errors originating from misspecification of the functional network model incorporated into either inverse solution steps. We introduce the Bayesian identification of a Hidden Gaussian Graphical Spectral (HIGGS) model specifying such oscillatory brain networks model. In human EEG alpha rhythm simulations, the estimation errors measured as ROC performance do not surpass 2% in our HIGGS inverse solution and reach 20% in state-of-the-art methods. Macaque simultaneous EEG/ECoG recordings provide experimental confirmation for our results with 1/3 times larger congruence according to Riemannian distances than state-of-the-art methods.
Collapse
Affiliation(s)
- Deirel Paz-Linares
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Department of Neuroinformatics, Cuban Neuroscience Center, Havana, Cuba
| | - Eduardo Gonzalez-Moreira
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Electrical Engineering, Central University "Marta Abreu" of Las Villas, Santa Clara, Cuba
- Center for Biomedical Imaging and Neuromodulation, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
| | - Ariosky Areces-Gonzalez
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Technical Sciences, University of Pinar del Río "Hermanos Saiz Montes de Oca", Pinar del Rio, Cuba
| | - Ying Wang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Min Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | | | - Jorge Bosch-Bayard
- Department of Neuroinformatics, Cuban Neuroscience Center, Havana, Cuba
- McGill Centre for Integrative Neurosciences MCIN, Ludmer Centre for Mental Health, Montreal Neurological Institute, McGill University, Montreal, Canada
| | - Maria L Bringas-Vega
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Department of Neuroinformatics, Cuban Neuroscience Center, Havana, Cuba
| | - Mitchell Valdes-Sosa
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Department of Neuroinformatics, Cuban Neuroscience Center, Havana, Cuba
| | - Pedro A Valdes-Sosa
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.
- Department of Neuroinformatics, Cuban Neuroscience Center, Havana, Cuba.
| |
Collapse
|
11
|
Schälte Y, Hasenauer J. Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation. PLoS One 2023; 18:e0285836. [PMID: 37216372 DOI: 10.1371/journal.pone.0285836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 05/02/2023] [Indexed: 05/24/2023] Open
Abstract
Calibrating model parameters on heterogeneous data can be challenging and inefficient. This holds especially for likelihood-free methods such as approximate Bayesian computation (ABC), which rely on the comparison of relevant features in simulated and observed data and are popular for otherwise intractable problems. To address this problem, methods have been developed to scale-normalize data, and to derive informative low-dimensional summary statistics using inverse regression models of parameters on data. However, while approaches only correcting for scale can be inefficient on partly uninformative data, the use of summary statistics can lead to information loss and relies on the accuracy of employed methods. In this work, we first show that the combination of adaptive scale normalization with regression-based summary statistics is advantageous on heterogeneous parameter scales. Second, we present an approach employing regression models not to transform data, but to inform sensitivity weights quantifying data informativeness. Third, we discuss problems for regression models under non-identifiability, and present a solution using target augmentation. We demonstrate improved accuracy and efficiency of the presented approach on various problems, in particular robustness and wide applicability of the sensitivity weights. Our findings demonstrate the potential of the adaptive approach. The developed algorithms have been made available in the open-source Python toolbox pyABC.
Collapse
Affiliation(s)
- Yannik Schälte
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Jan Hasenauer
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| |
Collapse
|
12
|
Lu R, Zhu H, Wu X. Estimating mutation rates in a Markov branching process using approximate Bayesian computation. J Theor Biol 2023; 565:111467. [PMID: 36963627 DOI: 10.1016/j.jtbi.2023.111467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 02/15/2023] [Accepted: 03/15/2023] [Indexed: 03/26/2023]
Abstract
Estimating microbial mutation rates is an essential task in evolutionary biology, with wide range applications in related fields such as virology, epidemiology, clinic and public health, and antibiotic research. Significant progress has been made on this research since 1943 when Luria-Delbrück fluctuation analysis was first introduced. However, existing estimators of mutation rates are heavily reliant on model assumptions in fluctuation analysis, and become less applicable to real microbial experiments which deviate from the model assumptions. To overcome this difficulty, we propose to model fluctuation experimental data by a two-type Markov branching process (MBP) and use approximate Bayesian computation (ABC) to estimate the mutation probability parameters. Such an ABC-based mutation rate estimator is based on intensive simulations from the mutation process, thereby taking advantage of modern computing power. Most importantly, its likelihood-free feature allows more complex and realistic setups of the mutation process, especially when the distribution of the number of mutants cannot be easily derived. To further improve computation efficiency, we use a Gaussian process surrogate to substitute the simulator in the ABC algorithm, and call the resulting estimator GPS-ABC. Simulation studies show that, when used to estimate constant mutation rate in MBP, ABC-based estimators generally outperform traditional moment or likelihood-based estimators. When mutations occur in two stages, i.e., in MBP with a piece-wise constant mutation rate function, traditional mutation rate estimators become not applicable, yet GPS-ABC still achieves reasonable estimates. Finally, the proposed GPS-ABC estimator is used to analyze real fluctuation experimental datasets for studying drug resistance.
Collapse
Affiliation(s)
- Ruijin Lu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, United States of America
| | - Hongxiao Zhu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, United States of America
| | - Xiaowei Wu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, United States of America.
| |
Collapse
|
13
|
Baey C, Smith HG, Rundlöf M, Olsson O, Clough Y, Sahlin U. Calibration of a bumble bee foraging model using Approximate Bayesian Computation. Ecol Modell 2023. [DOI: 10.1016/j.ecolmodel.2022.110251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
14
|
Järvenpää M, Corander J. On predictive inference for intractable models via approximate Bayesian computation. STATISTICS AND COMPUTING 2023; 33:42. [PMID: 36785730 PMCID: PMC9911513 DOI: 10.1007/s11222-022-10163-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 10/02/2022] [Indexed: 06/18/2023]
Abstract
UNLABELLED Approximate Bayesian computation (ABC) is commonly used for parameter estimation and model comparison for intractable simulator-based statistical models whose likelihood function cannot be evaluated. In this paper we instead investigate the feasibility of ABC as a generic approximate method for predictive inference, in particular, for computing the posterior predictive distribution of future observations or missing data of interest. We consider three complementary ABC approaches for this goal, each based on different assumptions regarding which predictive density of the intractable model can be sampled from. The case where only simulation from the joint density of the observed and future data given the model parameters can be used for inference is given particular attention and it is shown that the ideal summary statistic in this setting is minimal predictive sufficient instead of merely minimal sufficient (in the ordinary sense). An ABC prediction approach that takes advantage of a certain latent variable representation is also investigated. We additionally show how common ABC sampling algorithms can be used in the predictive settings considered. Our main results are first illustrated by using simple time-series models that facilitate analytical treatment, and later by using two common intractable dynamic models. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11222-022-10163-6.
Collapse
Affiliation(s)
- Marko Järvenpää
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), University of Helsinki, Helsinki, Finland
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| |
Collapse
|
15
|
Cheng CY, Calderazzo S, Schramm C, Schlander M. Modeling the Natural History and Screening Effects of Colorectal Cancer Using Both Adenoma and Serrated Neoplasia Pathways: The Development, Calibration, and Validation of a Discrete Event Simulation Model. MDM Policy Pract 2023; 8:23814683221145701. [PMID: 36698854 PMCID: PMC9869210 DOI: 10.1177/23814683221145701] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 11/28/2022] [Indexed: 01/22/2023] Open
Abstract
Background. Existing colorectal cancer (CRC) screening models mostly focus on the adenoma pathway of CRC development, overlooking the serrated neoplasia pathway, which might result in overly optimistic screening predictions. In addition, Bayesian inference methods have not been widely used for model calibration. We aimed to develop a CRC screening model accounting for both pathways, calibrate it with approximate Bayesian computation (ABC) methods, and validate it with large CRC screening trials. Methods. A discrete event simulation (DES) of the CRC natural history (DECAS) was constructed using the adenoma and serrated pathways in R software. The model simulates CRC-related events in a specific birth cohort through various natural history states. Calibration took advantage of 74 prevalence data points from the German screening colonoscopy program of 5.2 million average-risk participants using an ABC method. CRC incidence outputs from DECAS were validated with the German national cancer registry data; screening effects were validated using 17-y data from the UK Flexible Sigmoidoscopy Screening sigmoidoscopy trial and a German screening colonoscopy cohort study. Results. The Bayesian calibration rendered 1,000 sets of posterior parameter samples. With the calibrated parameters, the observed age- and sex-specific CRC prevalences from the German registries were within the 95% DECAS-predicted intervals. Regarding screening effects, DECAS predicted a 41% (95% intervals 30%-51%) and 62% (95% intervals 55%-68%) reduction in 17-y cumulative CRC mortality for a single screening sigmoidoscopy and colonoscopy, respectively, falling within 95% confidence intervals reported in the 2 clinical studies used for validation. Conclusions. We presented DECAS, the first Bayesian-calibrated DES model for CRC natural history and screening, accounting for 2 CRC tumorigenesis pathways. The validated model can serve as a valid tool to evaluate the (cost-)effectiveness of CRC screening strategies. Highlights This article presents a new discrete event simulation model, DECAS, which models both adenoma-carcinoma and serrated neoplasia pathways for colorectal cancer (CRC) development and CRC screening effects.DECAS is calibrated based on a Bayesian inference method using the data from German screening colonoscopy program, which consists of more than 5 million first-time average-risk participants aged 55 years and older in 2003 to 2014.DECAS is flexible for evaluating various CRC screening strategies and can differentiate screening effects in different parts of the colon.DECAS is validated with large screening sigmoidoscopy and colonoscopy clinical study data and can be further used to evaluate the (cost-)effectiveness of German colorectal cancer screening strategies.
Collapse
Affiliation(s)
- Chih-Yuan Cheng
- Division of Health Economics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Mannheim Medical Faculty, University of Heidelberg, Mannheim, Germany
| | - Silvia Calderazzo
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Christoph Schramm
- Clinics of Gastroenterology, Hepatology and Transplantation Medicine, Essen University Hospital, Essen, Germany
| | - Michael Schlander
- Division of Health Economics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Mannheim Medical Faculty, University of Heidelberg, Mannheim, Germany
- Alfred Weber Institute, University of Heidelberg, Heidelberg, Germany
| |
Collapse
|
16
|
Gaskell J, Campioni N, Morales JM, Husmeier D, Torney CJ. Inferring the interaction rules of complex systems with graph neural networks and approximate Bayesian computation. J R Soc Interface 2023; 20:20220676. [PMID: 36596456 PMCID: PMC9810425 DOI: 10.1098/rsif.2022.0676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 12/06/2022] [Indexed: 01/05/2023] Open
Abstract
Inferring the underlying processes that drive collective behaviour in biological and social systems is a significant statistical and computational challenge. While simulation models have been successful in qualitatively capturing many of the phenomena observed in these systems in a variety of domains, formally fitting these models to data remains intractable. Recently, approximate Bayesian computation (ABC) has been shown to be an effective approach to inference if the likelihood function for a model is unavailable. However, a key difficulty in successfully implementing ABC lies with the design, selection and weighting of appropriate summary statistics, a challenge that is especially acute when modelling high dimensional complex systems. In this work, we combine a Gaussian process accelerated ABC method with the automatic learning of summary statistics via graph neural networks. Our approach bypasses the need to design a model-specific set of summary statistics for inference. Instead, we encode relational inductive biases into a neural network using a graph embedding and then extract summary statistics automatically from simulation data. To evaluate our framework, we use a model of collective animal movement as a test bed and compare our method to a standard summary statistics approach and a linear regression-based algorithm.
Collapse
Affiliation(s)
- Jennifer Gaskell
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8SQ, UK
| | - Nazareno Campioni
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8SQ, UK
| | - Juan M. Morales
- Grupo de Ecología Cuantitativa, INIBIOMA-CONICET, Universidad Nacional del Comahue, Bariloche, Argentina
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow G12 8SQ, UK
| | - Dirk Husmeier
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8SQ, UK
| | - Colin J. Torney
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8SQ, UK
| |
Collapse
|
17
|
Martin GM, Frazier DT, Robert CP. Approximating Bayes in the 21st Century. Stat Sci 2023. [DOI: 10.1214/22-sts875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Affiliation(s)
- Gael M. Martin
- Gael M. Martin is Professor, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia
| | - David T. Frazier
- David T. Frazier is Associate Professor, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia
| | | |
Collapse
|
18
|
Coulier A, Singh P, Sturrock M, Hellander A. Systematic comparison of modeling fidelity levels and parameter inference settings applied to negative feedback gene regulation. PLoS Comput Biol 2022; 18:e1010683. [PMID: 36520957 PMCID: PMC9799300 DOI: 10.1371/journal.pcbi.1010683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 12/29/2022] [Accepted: 10/25/2022] [Indexed: 12/23/2022] Open
Abstract
Quantitative stochastic models of gene regulatory networks are important tools for studying cellular regulation. Such models can be formulated at many different levels of fidelity. A practical challenge is to determine what model fidelity to use in order to get accurate and representative results. The choice is important, because models of successively higher fidelity come at a rapidly increasing computational cost. In some situations, the level of detail is clearly motivated by the question under study. In many situations however, many model options could qualitatively agree with available data, depending on the amount of data and the nature of the observations. Here, an important distinction is whether we are interested in inferring the true (but unknown) physical parameters of the model or if it is sufficient to be able to capture and explain available data. The situation becomes complicated from a computational perspective because inference needs to be approximate. Most often it is based on likelihood-free Approximate Bayesian Computation (ABC) and here determining which summary statistics to use, as well as how much data is needed to reach the desired level of accuracy, are difficult tasks. Ultimately, all of these aspects-the model fidelity, the available data, and the numerical choices for inference-interplay in a complex manner. In this paper we develop a computational pipeline designed to systematically evaluate inference accuracy for a wide range of true known parameters. We then use it to explore inference settings for negative feedback gene regulation. In particular, we compare a detailed spatial stochastic model, a coarse-grained compartment-based multiscale model, and the standard well-mixed model, across several data-scenarios and for multiple numerical options for parameter inference. Practically speaking, this pipeline can be used as a preliminary step to guide modelers prior to gathering experimental data. By training Gaussian processes to approximate the distance function values, we are able to substantially reduce the computational cost of running the pipeline.
Collapse
Affiliation(s)
- Adrien Coulier
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Prashant Singh
- Science for Life Laboratory, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Marc Sturrock
- Department of Physiology, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Andreas Hellander
- Department of Information Technology, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
19
|
Moshe A, Wygoda E, Ecker N, Loewenthal G, Avram O, Israeli O, Hazkani-Covo E, Pe’er I, Pupko T. An Approximate Bayesian Computation Approach for Modeling Genome Rearrangements. Mol Biol Evol 2022; 39:msac231. [PMID: 36282896 PMCID: PMC9692237 DOI: 10.1093/molbev/msac231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024] Open
Abstract
The inference of genome rearrangement events has been extensively studied, as they play a major role in molecular evolution. However, probabilistic evolutionary models that explicitly imitate the evolutionary dynamics of such events, as well as methods to infer model parameters, are yet to be fully utilized. Here, we developed a probabilistic approach to infer genome rearrangement rate parameters using an Approximate Bayesian Computation (ABC) framework. We developed two genome rearrangement models, a basic model, which accounts for genomic changes in gene order, and a more sophisticated one which also accounts for changes in chromosome number. We characterized the ABC inference accuracy using simulations and applied our methodology to both prokaryotic and eukaryotic empirical datasets. Knowledge of genome-rearrangement rates can help elucidate their role in evolution as well as help simulate genomes with evolutionary dynamics that reflect empirical genomes.
Collapse
Affiliation(s)
- Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Omer Israeli
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Einat Hazkani-Covo
- Department of Natural and Life Sciences, Open University of Israel, Ra'anana, Israel
| | - Itsik Pe’er
- Department of Computer Science, Columbia University, New York, USA
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
20
|
AKesson M, Singh P, Wrede F, Hellander A. Convolutional Neural Networks as Summary Statistics for Approximate Bayesian Computation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3353-3365. [PMID: 34460381 PMCID: PMC9847490 DOI: 10.1109/tcbb.2021.3108695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Approximate Bayesian Computation is widely used in systems biology for inferring parameters in stochastic gene regulatory network models. Its performance hinges critically on the ability to summarize high-dimensional system responses such as time series into a few informative, low-dimensional summary statistics. The quality of those statistics acutely impacts the accuracy of the inference task. Existing methods to select the best subset out of a pool of candidate statistics do not scale well with large pools of several tens to hundreds of candidate statistics. Since high quality statistics are imperative for good performance, this becomes a serious bottleneck when performing inference on complex and high-dimensional problems. This paper proposes a convolutional neural network architecture for automatically learning informative summary statistics of temporal responses. We show that the proposed network can effectively circumvent the statistics selection problem of the preprocessing step for ABC inference. The proposed approach is demonstrated on two benchmark problem and one challenging inference problem learning parameters in a high-dimensional stochastic genetic oscillator. We also study the impact of experimental design on network performance by comparing different data richness and data acquisition strategies.
Collapse
|
21
|
Pesonen H, Simola U, Köhn‐Luque A, Vuollekoski H, Lai X, Frigessi A, Kaski S, Frazier DT, Maneesoonthorn W, Martin GM, Corander J. ABC of the future. Int Stat Rev 2022. [DOI: 10.1111/insr.12522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Henri Pesonen
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
| | - Umberto Simola
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics University of Helsinki Helsinki Finland
| | - Alvaro Köhn‐Luque
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
| | - Henri Vuollekoski
- Helsinki Institute of Information Technology, Department of Computer Science Aalto University Helsinki Finland
| | - Xiaoran Lai
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
| | - Arnoldo Frigessi
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
- Oslo Centre for Biostatistics and Epidemiology Oslo University Hospital Oslo Norway
| | - Samuel Kaski
- Helsinki Institute of Information Technology, Department of Computer Science Aalto University Helsinki Finland
- Department of Computer Science University of Manchester Manchester UK
| | - David T. Frazier
- Department of Econometrics & Business Statistics Monash University Clayton Victoria Australia
| | | | - Gael M. Martin
- Department of Econometrics & Business Statistics Monash University Clayton Victoria Australia
| | - Jukka Corander
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics University of Helsinki Helsinki Finland
- Parasites and Microbes Wellcome Sanger Institute Hinxton UK
| |
Collapse
|
22
|
Wang Y, Wang P, Zhang S, Pan H. Uncertainty Modeling of a Modified SEIR Epidemic Model for COVID-19. BIOLOGY 2022; 11:biology11081157. [PMID: 36009784 PMCID: PMC9404969 DOI: 10.3390/biology11081157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/27/2022] [Accepted: 07/30/2022] [Indexed: 06/01/2023]
Abstract
Based on SEIR (susceptible-exposed-infectious-removed) epidemic model, we propose a modified epidemic mathematical model to describe the spread of the coronavirus disease 2019 (COVID-19) epidemic in Wuhan, China. Using public data, the uncertainty parameters of the proposed model for COVID-19 in Wuhan were calibrated. The uncertainty of the control basic reproduction number was studied with the posterior probability density function of the uncertainty model parameters. The mathematical model was used to inverse deduce the earliest start date of COVID-19 infection in Wuhan with consideration of the lack of information for the initial conditions of the model. The result of the uncertainty analysis of the model is in line with the observed data for COVID-19 in Wuhan, China. The numerical results show that the modified mathematical model could model the spread of COVID-19 epidemics.
Collapse
|
23
|
Craiu RV, Gustafson P, Rosenthal JS. Reflections on Bayesian inference and Markov chain Monte Carlo. CAN J STAT 2022. [DOI: 10.1002/cjs.11707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Radu V. Craiu
- Department of Statistical Sciences University of Toronto Toronto Ontario Canada
| | - Paul Gustafson
- Department of Statistics University of British Columbia Vancouver British Columbia Canada
| | | |
Collapse
|
24
|
Muratore F, Ramos F, Turk G, Yu W, Gienger M, Peters J. Robot Learning From Randomized Simulations: A Review. Front Robot AI 2022; 9:799893. [PMID: 35494543 PMCID: PMC9038844 DOI: 10.3389/frobt.2022.799893] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 01/21/2022] [Indexed: 11/13/2022] Open
Abstract
The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. Unfortunately, it is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the "reality gap." We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named "domain randomization" which is a method for learning from randomized simulations.
Collapse
Affiliation(s)
- Fabio Muratore
- Intelligent Autonomous Systems Group, Technical University of Darmstadt, Darmstadt, Germany
- Honda Research Institute Europe, Offenbach am Main, Germany
| | - Fabio Ramos
- School of Computer Science, University of Sydney, Sydney, NSW, Australia
- NVIDIA, Seattle, WA, United States
| | - Greg Turk
- Georgia Institute of Technology, Atlanta, GA, United States
| | - Wenhao Yu
- Robotics at Google, Mountain View, CA, United States
| | | | - Jan Peters
- Intelligent Autonomous Systems Group, Technical University of Darmstadt, Darmstadt, Germany
| |
Collapse
|
25
|
Wang S. Self-Supervised Metric Learning in Multi-View Data: A Downstream Task Perspective. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2057317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Shulei Wang
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820
| |
Collapse
|
26
|
Computationally efficient parameter estimation for spatial individual-level models of infectious disease transmission. Spat Spatiotemporal Epidemiol 2022; 41:100497. [DOI: 10.1016/j.sste.2022.100497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 11/26/2021] [Accepted: 03/02/2022] [Indexed: 11/19/2022]
|
27
|
Dutta R, Zouaoui Boudjeltia K, Kotsalos C, Rousseau A, Ribeiro de Sousa D, Desmet JM, Van Meerhaeghe A, Mira A, Chopard B. Personalized pathology test for Cardio-vascular disease: Approximate Bayesian computation with discriminative summary statistics learning. PLoS Comput Biol 2022; 18:e1009910. [PMID: 35271585 PMCID: PMC8939803 DOI: 10.1371/journal.pcbi.1009910] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/22/2022] [Accepted: 02/09/2022] [Indexed: 11/19/2022] Open
Abstract
Cardio/cerebrovascular diseases (CVD) have become one of the major health issue in our societies. But recent studies show that the present pathology tests to detect CVD are ineffectual as they do not consider different stages of platelet activation or the molecular dynamics involved in platelet interactions and are incapable to consider inter-individual variability. Here we propose a stochastic platelet deposition model and an inferential scheme to estimate the biologically meaningful model parameters using approximate Bayesian computation with a summary statistic that maximally discriminates between different types of patients. Inferred parameters from data collected on healthy volunteers and different patient types help us to identify specific biological parameters and hence biological reasoning behind the dysfunction for each type of patients. This work opens up an unprecedented opportunity of personalized pathology test for CVD detection and medical treatment.
Collapse
Affiliation(s)
| | - Karim Zouaoui Boudjeltia
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | | | - Alexandre Rousseau
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | - Daniel Ribeiro de Sousa
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | - Jean-Marc Desmet
- Nephrology Department, ISPPC CHU de Charleroi, Charleroi, Belgium
| | | | - Antonietta Mira
- Università della Svizzera italiana, Lugano, Switzerland
- University of Insubria, Varese, Italy
| | | |
Collapse
|
28
|
Cereda G, Corradi F, Viscardi C. Learning the two parameters of the
Poisson‐Dirichlet
distribution with a forensic application. Scand Stat Theory Appl 2022. [DOI: 10.1111/sjos.12575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Giulia Cereda
- Dipartimento di Statistica, Informatica Applicazioni (DISIA), University of Florence Italy
| | - Fabio Corradi
- Dipartimento di Statistica, Informatica Applicazioni (DISIA), University of Florence Italy
| | - Cecilia Viscardi
- Dipartimento di Statistica, Informatica Applicazioni (DISIA), University of Florence Italy
| |
Collapse
|
29
|
Alawamy EA, Liu Y, Zhao YQ. Bayesian analysis for single-server Markovian queues based on the No-U-Turn sampler. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2025841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Eman Ahmed Alawamy
- School of Mathematics and Statistics, Central South University, Changsha, Hunan, China
| | - Yuanyuan Liu
- School of Mathematics and Statistics, Central South University, Changsha, Hunan, China
| | - Yiqiang Q. Zhao
- School of Mathematics and Statistics, Carleton University, Ottawa, Canada
| |
Collapse
|
30
|
Cortell-Nicolau A, García-Puchol O, Barrera-Cruz M, García-Rivero D. The spread of agriculture in Iberia through Approximate Bayesian Computation and Neolithic projectile tools. PLoS One 2021; 16:e0261813. [PMID: 34962962 PMCID: PMC8714124 DOI: 10.1371/journal.pone.0261813] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 12/13/2021] [Indexed: 11/18/2022] Open
Abstract
In the present article we use geometric microliths (a specific type of arrowhead) and Approximate Bayesian Computation (ABC) in order to evaluate possible origin points and expansion routes for the Neolithic in the Iberian Peninsula. In order to do so, we divide the Iberian Peninsula in four areas (Ebro river, Catalan shores, Xúquer river and Guadalquivir river) and we sample the geometric microliths existing in the sites with the oldest radiocarbon dates for each zone. On this data, we perform a partial Mantel test with three matrices: geographic distance matrix, cultural distance matrix and chronological distance matrix. After this is done, we simulate a series of partial Mantel tests where we alter the chronological matrix by using an expansion model with randomised origin points, and using the distribution of the observed partial Mantel test’s results as a summary statistic within an Approximate Bayesian Computation-Sequential Monte-Carlo (ABC-SMC) algorithm framework. Our results point clearly to a Neolithic expansion route following the Northern Mediterranean, whilst the Southern Mediterranean route could also find support and should be further discussed. The most probable origin points focus on the Xúquer river area.
Collapse
Affiliation(s)
- Alfredo Cortell-Nicolau
- Departament de Prehistòria, Arqueologia i Història Antiga, Facultat de Geografia i Història, Universitat de València, València, Spain
- Department of Archaeology, McDonald Institute for Archaeological Research, Faculty of Human, Social, and Political Science, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (OGP); (ACN)
| | - Oreto García-Puchol
- Departament de Prehistòria, Arqueologia i Història Antiga, Facultat de Geografia i Història, Universitat de València, València, Spain
- * E-mail: (OGP); (ACN)
| | - María Barrera-Cruz
- Departament de Prehistòria, Arqueologia i Història Antiga, Facultat de Geografia i Història, Universitat de València, València, Spain
| | - Daniel García-Rivero
- Department of Prehistory and Archaeology, Faculty of Geography and History, University of Seville, Seville, Spain
| |
Collapse
|
31
|
Warne DJ, Baker RE, Simpson MJ. Rapid Bayesian Inference for Expensive Stochastic Models. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.2000419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- David J. Warne
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Ruth E. Baker
- Mathematical Institute, University of Oxford, Oxford, UK
| | - Matthew J. Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
32
|
Exact simulation of coupled Wright–Fisher diffusions. ADV APPL PROBAB 2021. [DOI: 10.1017/apr.2021.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
AbstractIn this paper an exact rejection algorithm for simulating paths of the coupled Wright–Fisher diffusion is introduced. The coupled Wright–Fisher diffusion is a family of multivariate Wright–Fisher diffusions that have drifts depending on each other through a coupling term and that find applications in the study of networks of interacting genes. The proposed rejection algorithm uses independent neutral Wright–Fisher diffusions as candidate proposals, which are only needed at a finite number of points. Once a candidate is accepted, the remainder of the path can be recovered by sampling from neutral multivariate Wright–Fisher bridges, for which an exact sampling strategy is also provided. Finally, the algorithm’s complexity is derived and its performance demonstrated in a simulation study.
Collapse
|
33
|
McCullough K, Dmitrieva T, Ebrahimi N. New approximate Bayesian computation algorithm for censored data. Comput Stat 2021. [DOI: 10.1007/s00180-021-01167-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
34
|
Bi J, Shen W, Zhu W. Random Forest Adjustment for Approximate Bayesian Computation. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1981341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Jiefeng Bi
- Wang Yanan Institute for Studies in Economics (WISE), Xiamen University, Xiamen, China
| | - Weining Shen
- Department of Statistics, University of California, Irvine, CA
| | - Weixuan Zhu
- Wang Yanan Institute for Studies in Economics (WISE), Department of Statistics and Data Science, School of Economics, Xiamen University, Xiamen, China
| |
Collapse
|
35
|
Priddle JW, Sisson SA, Frazier DT, Turner I, Drovandi C. Efficient Bayesian Synthetic Likelihood With Whitening Transformations. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1979012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Jacob W. Priddle
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | | | - David T. Frazier
- Department of Econometrics and Business Statistics, Monash University, Clayton, Australia
| | - Ian Turner
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Christopher Drovandi
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
36
|
Perez MF, Bonatelli IAS, Romeiro-Brito M, Franco FF, Taylor NP, Zappi DC, Moraes EM. Coalescent-based species delimitation meets deep learning: Insights from a highly fragmented cactus system. Mol Ecol Resour 2021; 22:1016-1028. [PMID: 34669256 DOI: 10.1111/1755-0998.13534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 09/16/2021] [Accepted: 10/12/2021] [Indexed: 11/26/2022]
Abstract
Delimiting species boundaries is a major goal in evolutionary biology. An increasing volume of literature has focused on the challenges of investigating cryptic diversity within complex evolutionary scenarios of speciation, including gene flow and demographic fluctuations. New methods based on model selection, such as approximate Bayesian computation, approximate likelihoods, and machine learning are promising tools arising in this field. Here, we introduce a framework for species delimitation using the multispecies coalescent model coupled with a deep learning algorithm based on convolutional neural networks (CNNs). We compared this strategy with a similar ABC approach. We applied both methods to test species boundary hypotheses based on current and previous taxonomic delimitations as well as genetic data (sequences from 41 loci) in Pilosocereus aurisetus, a cactus species complex with a sky-island distribution and taxonomic uncertainty. To validate our method, we also applied the same strategy on data from widely accepted species from the genus Drosophila. The results show that our CNN approach has a high capacity to distinguish among the simulated species delimitation scenarios, with higher accuracy than ABC. For the cactus data set, a splitter hypothesis without gene flow showed the highest probability in both CNN and ABC approaches, a result agreeing with previous taxonomic classifications and in line with the sky-island distribution and low dispersal of P. aurisetus. Our results highlight the cryptic diversity within the P. aurisetus complex and show that CNNs are a promising approach for distinguishing complex evolutionary histories, even outperforming the accuracy of other model-based approaches such as ABC.
Collapse
Affiliation(s)
- Manolo F Perez
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil.,Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil
| | - Isabel A S Bonatelli
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil.,Departamento de Ecologia e Biologia Evolutiva, Universidade Federal de São Paulo, Diadema, Brazil
| | | | - Fernando F Franco
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil
| | | | - Daniela C Zappi
- Programa de Pós Graduação em Botânica, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Brazil
| | - Evandro M Moraes
- Departamento de Biologia, Universidade Federal de São Carlos, Sorocaba, Brazil
| |
Collapse
|
37
|
Carr MJ, Simpson MJ, Drovandi C. Estimating parameters of a stochastic cell invasion model with fluorescent cell cycle labelling using approximate Bayesian computation. J R Soc Interface 2021; 18:20210362. [PMID: 34547212 PMCID: PMC8455172 DOI: 10.1098/rsif.2021.0362] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
We develop a parameter estimation method based on approximate Bayesian computation (ABC) for a stochastic cell invasion model using fluorescent cell cycle labelling with proliferation, migration and crowding effects. Previously, inference has been performed on a deterministic version of the model fitted to cell density data, and not all parameters were identifiable. Considering the stochastic model allows us to harness more features of experimental data, including cell trajectories and cell count data, which we show overcomes the parameter identifiability problem. We demonstrate that, while difficult to collect, cell trajectory data can provide more information about the parameters of the cell invasion model. To handle the intractability of the likelihood function of the stochastic model, we use an efficient ABC algorithm based on sequential Monte Carlo. Rcpp and MATLAB implementations of the simulation model and ABC algorithm used in this study are available at https://github.com/michaelcarr-stats/FUCCI.
Collapse
Affiliation(s)
- Michael J Carr
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Matthew J Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Christopher Drovandi
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
38
|
Goodness of fit for models with intractable likelihood. TEST-SPAIN 2021. [DOI: 10.1007/s11749-020-00747-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
39
|
Burton J, Manning CS, Rattray M, Papalopulu N, Kursawe J. Inferring kinetic parameters of oscillatory gene regulation from single cell time-series data. J R Soc Interface 2021; 18:20210393. [PMID: 34583566 PMCID: PMC8479358 DOI: 10.1098/rsif.2021.0393] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/26/2021] [Indexed: 11/19/2022] Open
Abstract
Gene expression dynamics, such as stochastic oscillations and aperiodic fluctuations, have been associated with cell fate changes in multiple contexts, including development and cancer. Single cell live imaging of protein expression with endogenous reporters is widely used to observe such gene expression dynamics. However, the experimental investigation of regulatory mechanisms underlying the observed dynamics is challenging, since these mechanisms include complex interactions of multiple processes, including transcription, translation and protein degradation. Here, we present a Bayesian method to infer kinetic parameters of oscillatory gene expression regulation using an auto-negative feedback motif with delay. Specifically, we use a delay-adapted nonlinear Kalman filter within a Metropolis-adjusted Langevin algorithm to identify posterior probability distributions. Our method can be applied to time-series data on gene expression from single cells and is able to infer multiple parameters simultaneously. We apply it to published data on murine neural progenitor cells and show that it outperforms alternative methods. We further analyse how parameter uncertainty depends on the duration and time resolution of an imaging experiment, to make experimental design recommendations. This work demonstrates the utility of parameter inference on time course data from single cells and enables new studies on cell fate changes and population heterogeneity.
Collapse
Affiliation(s)
- Joshua Burton
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology Medicine and Health, The University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - Cerys S. Manning
- Division of Developmental Biology and Medicine, School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - Magnus Rattray
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology Medicine and Health, The University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - Nancy Papalopulu
- Division of Developmental Biology and Medicine, School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - Jochen Kursawe
- School of Mathematics and Statistics, University of St Andrews, North Haugh, St Andrews, KY16 9SS, UK
| |
Collapse
|
40
|
McCain JSP, Tagliabue A, Susko E, Achterberg EP, Allen AE, Bertrand EM. Cellular costs underpin micronutrient limitation in phytoplankton. SCIENCE ADVANCES 2021; 7:7/32/eabg6501. [PMID: 34362734 PMCID: PMC8346223 DOI: 10.1126/sciadv.abg6501] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/22/2021] [Indexed: 05/08/2023]
Abstract
Micronutrients control phytoplankton growth in the ocean, influencing carbon export and fisheries. It is currently unclear how micronutrient scarcity affects cellular processes and how interdependence across micronutrients arises. We show that proximate causes of micronutrient growth limitation and interdependence are governed by cumulative cellular costs of acquiring and using micronutrients. Using a mechanistic proteomic allocation model of a polar diatom focused on iron and manganese, we demonstrate how cellular processes fundamentally underpin micronutrient limitation, and how they interact and compensate for each other to shape cellular elemental stoichiometry and resource interdependence. We coupled our model with metaproteomic and environmental data, yielding an approach for estimating biogeochemical metrics, including taxon-specific growth rates. Our results show that cumulative cellular costs govern how environmental conditions modify phytoplankton growth.
Collapse
Affiliation(s)
- J Scott P McCain
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | | | - Edward Susko
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Eric P Achterberg
- GEOMAR Helmholtz Center for Ocean Research Kiel, Wischhofstrasse 1-3, 24148 Kiel, Germany
| | - Andrew E Allen
- Microbial and Environmental Genomics, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Integrative Oceanography Division, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA 92037, USA
| | - Erin M Bertrand
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
41
|
Howard-McCombe J, Ward D, Kitchener AC, Lawson D, Senn HV, Beaumont M. On the use of genome-wide data to model and date the time of anthropogenic hybridisation: An example from the Scottish wildcat. Mol Ecol 2021; 30:3688-3702. [PMID: 34042240 DOI: 10.1111/mec.16000] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 11/28/2022]
Abstract
While hybridisation has long been recognised as an important natural phenomenon in evolution, the conservation of taxa subject to introgressive hybridisation from domesticated forms is a subject of intense debate. Hybridisation of Scottish wildcats and domestic cats is a good example in this regard. Here, we developed a modelling framework to determine the timescale of introgression using approximate Bayesian computation (ABC). Applying the model to ddRAD-seq data from 129 individuals, genotyped at 6546 loci, we show that a population of wildcats genetically distant from domestic cats is still present in Scotland. These individuals were found almost exclusively within the captive breeding programme. Most wild-living cats sampled were introgressed to some extent. The demographic model predicts high levels of gene-flow between domestic cats and Scottish wildcats (13% migrants per generation) over a short timeframe, the posterior mean for the onset of hybridisation (T1 ) was 3.3 generations (~10 years) before present. Although the model had limited power to detect signals of ancient admixture, we found evidence that significant recent hybridisation may have occurred subsequent to the founding of the captive breeding population (T2 ). The model consistently predicts T1 after T2 , estimated here to be 19.3 generations (~60 years) ago, highlighting the importance of this population as a resource for conservation management. Additionally, we evaluate the effectiveness of current methods to classify hybrids. We show that an optimised 35 SNP panel is a better predictor of the ddRAD-based hybrid score in comparison with a morphological method.
Collapse
Affiliation(s)
| | - Daniel Ward
- School of Mathematics, University of Bristol, Bristol, UK
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Edinburgh, UK
| | - Daniel Lawson
- School of Mathematics, University of Bristol, Bristol, UK
| | - Helen V Senn
- RZSS WildGenes Laboratory, Royal Zoological Society of Scotland, Edinburgh, UK
| | - Mark Beaumont
- School of Biological Sciences, University of Bristol, Bristol, UK
| |
Collapse
|
42
|
Jiang RM, Wrede F, Singh P, Hellander A, Petzold LR. Accelerated regression-based summary statistics for discrete stochastic systems via approximate simulators. BMC Bioinformatics 2021; 22:339. [PMID: 34162329 PMCID: PMC8220802 DOI: 10.1186/s12859-021-04255-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 06/10/2021] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Approximate Bayesian Computation (ABC) has become a key tool for calibrating the parameters of discrete stochastic biochemical models. For higher dimensional models and data, its performance is strongly dependent on having a representative set of summary statistics. While regression-based methods have been demonstrated to allow for the automatic construction of effective summary statistics, their reliance on first simulating a large training set creates a significant overhead when applying these methods to discrete stochastic models for which simulation is relatively expensive. In this τ work, we present a method to reduce this computational burden by leveraging approximate simulators of these systems, such as ordinary differential equations and τ-Leaping approximations. RESULTS We have developed an algorithm to accelerate the construction of regression-based summary statistics for Approximate Bayesian Computation by selectively using the faster approximate algorithms for simulations. By posing the problem as one of ratio estimation, we use state-of-the-art methods in machine learning to show that, in many cases, our algorithm can significantly reduce the number of simulations from the full resolution model at a minimal cost to accuracy and little additional tuning from the user. We demonstrate the usefulness and robustness of our method with four different experiments. CONCLUSIONS We provide a novel algorithm for accelerating the construction of summary statistics for stochastic biochemical systems. Compared to the standard practice of exclusively training from exact simulator samples, our method is able to dramatically reduce the number of required calls to the stochastic simulator at a minimal loss in accuracy. This can immediately be implemented to increase the overall speed of the ABC workflow for estimating parameters in complex systems.
Collapse
Affiliation(s)
- Richard M. Jiang
- Department of Computer Science, University of California, Santa Barbara, Santa Barbara, USA
| | - Fredrik Wrede
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Prashant Singh
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Andreas Hellander
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Linda R. Petzold
- Department of Computer Science, University of California, Santa Barbara, Santa Barbara, USA
| |
Collapse
|
43
|
Ebert A, Dutta R, Mengersen K, Mira A, Ruggeri F, Wu P. Likelihood‐free parameter estimation for dynamic queueing networks: Case study of passenger flow in an international airport terminal. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12487] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Anthony Ebert
- Università della Svizzera italiana Lugano Switzerland
- Queensland University of Technology Brisbane Australia
| | | | | | - Antonietta Mira
- Università della Svizzera italiana Lugano Switzerland
- Università dell’Insubria Como Italy
| | - Fabrizio Ruggeri
- Queensland University of Technology Brisbane Australia
- CNR‐IMATI Milano Italy
| | - Paul Wu
- Queensland University of Technology Brisbane Australia
| |
Collapse
|
44
|
Vihrs N, Møller J, Gelfand AE. Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation. Scand Stat Theory Appl 2021. [DOI: 10.1111/sjos.12509] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ninna Vihrs
- Department of Mathematical Sciences Aalborg University Aalborg Denmark
| | - Jesper Møller
- Department of Mathematical Sciences Aalborg University Aalborg Denmark
| | - Alan E. Gelfand
- Department of Statistical Science Duke University Durham North Carolina USA
| |
Collapse
|
45
|
Vono M, Dobigeon N, Chainais P. Asymptotically Exact Data Augmentation: Models, Properties, and Algorithms. J Comput Graph Stat 2020. [DOI: 10.1080/10618600.2020.1826954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Maxime Vono
- University of Toulouse, IRIT/INP-ENSEEIHT, Toulouse, France
| | | | - Pierre Chainais
- University of Lille, Centrale Lille, UMR CNRS 9189—CRIStAL, Lille, France
| |
Collapse
|
46
|
Clarté G, Robert CP, Ryder RJ, Stoehr J. Componentwise approximate Bayesian computation via Gibbs-like steps. Biometrika 2020. [DOI: 10.1093/biomet/asaa090] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Summary
Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are, however, sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty we explore a Gibbs version of the approximate Bayesian computation approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution, and some hierarchical versions of the proposed mechanism enjoy a closed-form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.
Collapse
Affiliation(s)
- Grégoire Clarté
- CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris, Cedex 16, France
| | - Christian P Robert
- CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris, Cedex 16, France
| | - Robin J Ryder
- CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris, Cedex 16, France
| | - Julien Stoehr
- CEREMADE, Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny, 75775 Paris, Cedex 16, France
| |
Collapse
|
47
|
Gardner E, Breeze TD, Clough Y, Smith HG, Baldock KCR, Campbell A, Garratt MPD, Gillespie MAK, Kunin WE, McKerchar M, Memmott J, Potts SG, Senapathi D, Stone GN, Wäckers F, Westbury DB, Wilby A, Oliver TH. Reliably predicting pollinator abundance: Challenges of calibrating process‐based ecological models. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13483] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Emma Gardner
- School of Biological Sciences University of Reading Reading UK
- Centre for Agri‐Environmental ResearchUniversity of Reading Reading UK
| | - Tom D. Breeze
- Centre for Agri‐Environmental ResearchUniversity of Reading Reading UK
| | - Yann Clough
- Centre for Environmental and Climate Research Lund University Lund Sweden
| | - Henrik G. Smith
- Centre for Environmental and Climate Research Lund University Lund Sweden
| | - Katherine C. R. Baldock
- School of Biological Sciences University of Bristol Bristol UK
- Cabot InstituteUniversity of Bristol Bristol UK
- Department of Geographical and Environmental Sciences Northumbria University Newcastle upon Tyne UK
| | | | | | - Mark A. K. Gillespie
- School of Biology University of Leeds Leeds UK
- Department of Environmental Sciences Western Norway University of Applied Sciences Sogndal Norway
| | | | - Megan McKerchar
- School of Science and the Environment University of Worcester Worcester UK
| | - Jane Memmott
- School of Biological Sciences University of Bristol Bristol UK
| | - Simon G. Potts
- Centre for Agri‐Environmental ResearchUniversity of Reading Reading UK
| | - Deepa Senapathi
- Centre for Agri‐Environmental ResearchUniversity of Reading Reading UK
| | - Graham N. Stone
- Institute of Evolutionary Biology University of Edinburgh Edinburgh UK
| | - Felix Wäckers
- Lancaster Environment Centre Lancaster University Lancaster UK
| | - Duncan B. Westbury
- School of Science and the Environment University of Worcester Worcester UK
| | - Andrew Wilby
- Lancaster Environment Centre Lancaster University Lancaster UK
| | - Tom H. Oliver
- School of Biological Sciences University of Reading Reading UK
| |
Collapse
|
48
|
Harrison JU, Baker RE. An automatic adaptive method to combine summary statistics in approximate Bayesian computation. PLoS One 2020; 15:e0236954. [PMID: 32760106 PMCID: PMC7410215 DOI: 10.1371/journal.pone.0236954] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 07/16/2020] [Indexed: 11/18/2022] Open
Abstract
To infer the parameters of mechanistic models with intractable likelihoods, techniques such as approximate Bayesian computation (ABC) are increasingly being adopted. One of the main disadvantages of ABC in practical situations, however, is that parameter inference must generally rely on summary statistics of the data. This is particularly the case for problems involving high-dimensional data, such as biological imaging experiments. However, some summary statistics contain more information about parameters of interest than others, and it is not always clear how to weight their contributions within the ABC framework. We address this problem by developing an automatic, adaptive algorithm that chooses weights for each summary statistic. Our algorithm aims to maximize the distance between the prior and the approximate posterior by automatically adapting the weights within the ABC distance function. Computationally, we use a nearest neighbour estimator of the distance between distributions. We justify the algorithm theoretically based on properties of the nearest neighbour distance estimator. To demonstrate the effectiveness of our algorithm, we apply it to a variety of test problems, including several stochastic models of biochemical reaction networks, and a spatial model of diffusion, and compare our results with existing algorithms.
Collapse
Affiliation(s)
- Jonathan U. Harrison
- Mathematical Institute, Mathematical Sciences Building, University of Warwick, Coventry, United Kingdom
- * E-mail:
| | - Ruth E. Baker
- Mathematical Institute, Andrew Wiles Building, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
49
|
Sanchez T, Cury J, Charpiat G, Jay F. Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation. Mol Ecol Resour 2020; 21:2645-2660. [DOI: 10.1111/1755-0998.13224] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/19/2020] [Accepted: 07/02/2020] [Indexed: 12/28/2022]
Affiliation(s)
- Théophile Sanchez
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Jean Cury
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Guillaume Charpiat
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Flora Jay
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| |
Collapse
|
50
|
Schälte Y, Hasenauer J. Efficient exact inference for dynamical systems with noisy measurements using sequential approximate Bayesian computation. Bioinformatics 2020; 36:i551-i559. [PMID: 32657404 PMCID: PMC7355286 DOI: 10.1093/bioinformatics/btaa397] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Approximate Bayesian computation (ABC) is an increasingly popular method for likelihood-free parameter inference in systems biology and other fields of research, as it allows analyzing complex stochastic models. However, the introduced approximation error is often not clear. It has been shown that ABC actually gives exact inference under the implicit assumption of a measurement noise model. Noise being common in biological systems, it is intriguing to exploit this insight. But this is difficult in practice, as ABC is in general highly computationally demanding. Thus, the question we want to answer here is how to efficiently account for measurement noise in ABC. RESULTS We illustrate exemplarily how ABC yields erroneous parameter estimates when neglecting measurement noise. Then, we discuss practical ways of correctly including the measurement noise in the analysis. We present an efficient adaptive sequential importance sampling-based algorithm applicable to various model types and noise models. We test and compare it on several models, including ordinary and stochastic differential equations, Markov jump processes and stochastically interacting agents, and noise models including normal, Laplace and Poisson noise. We conclude that the proposed algorithm could improve the accuracy of parameter estimates for a broad spectrum of applications. AVAILABILITY AND IMPLEMENTATION The developed algorithms are made publicly available as part of the open-source python toolbox pyABC (https://github.com/icb-dcm/pyabc). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yannik Schälte
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg 85764, Germany
- Department of Mathematics, Chair of Mathematical Modeling of Biological Systems, Technical University Munich, Garching 85748, Germany
| | - Jan Hasenauer
- Helmholtz Zentrum München, Institute of Computational Biology, Neuherberg 85764, Germany
- Department of Mathematics, Chair of Mathematical Modeling of Biological Systems, Technical University Munich, Garching 85748, Germany
- Research Unit Biomathematics, University of Bonn, Bonn 53113, Germany
| |
Collapse
|