1
|
Ceccarelli A, Browning AP, Baker RE. Approximate Solutions of a General Stochastic Velocity-Jump Model Subject to Discrete-Time Noisy Observations. Bull Math Biol 2025; 87:57. [PMID: 40131568 PMCID: PMC11937228 DOI: 10.1007/s11538-025-01437-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 03/05/2025] [Indexed: 03/27/2025]
Abstract
Advances in experimental techniques allow the collection of high-resolution spatio-temporal data that track individual motile entities over time. These tracking data motivate the use of mathematical models to characterise the motion observed. In this paper, we aim to describe the solutions of velocity-jump models for single-agent motion in one spatial dimension, characterised by successive Markovian transitions within a finite network of n states, each with a specified velocity and a fixed rate of switching to every other state. In particular, we focus on obtaining the solutions of the model subject to noisy, discrete-time, observations, with no direct access to the agent state. The lack of direct observation of the hidden state makes the problem of finding the exact distributions generally intractable. Therefore, we derive a series of approximations for the data distributions. We verify the accuracy of these approximations by comparing them to the empirical distributions generated through simulations of four example model structures. These comparisons confirm that the approximations are accurate given sufficiently infrequent state switching relative to the imaging frequency. The approximate distributions computed can be used to obtain fast forwards predictions, to give guidelines on experimental design, and as likelihoods for inference and model selection.
Collapse
Affiliation(s)
- Arianna Ceccarelli
- Mathematical Institute, University of Oxford, Woodstock Road, Oxford, OX2 6GG, UK.
| | - Alexander P Browning
- Mathematical Institute, University of Oxford, Woodstock Road, Oxford, OX2 6GG, UK
| | - Ruth E Baker
- Mathematical Institute, University of Oxford, Woodstock Road, Oxford, OX2 6GG, UK
| |
Collapse
|
2
|
Pflug FG, Haendeler S, Esk C, Lindenhofer D, Knoblich JA, von Haeseler A. Neutral competition explains the clonal composition of neural organoids. PLoS Comput Biol 2024; 20:e1012054. [PMID: 38648250 PMCID: PMC11065252 DOI: 10.1371/journal.pcbi.1012054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 05/02/2024] [Accepted: 04/03/2024] [Indexed: 04/25/2024] Open
Abstract
Neural organoids model the development of the human brain and are an indispensable tool for studying neurodevelopment. Whole-organoid lineage tracing has revealed the number of progenies arising from each initial stem cell to be highly diverse, with lineage sizes ranging from one to more than 20,000 cells. This high variability exceeds what can be explained by existing stochastic models of corticogenesis and indicates the existence of an additional source of stochasticity. To explain this variability, we introduce the SAN model which distinguishes Symmetrically diving, Asymmetrically dividing, and Non-proliferating cells. In the SAN model, the additional source of stochasticity is the survival time of a lineage's pool of symmetrically dividing cells. These survival times result from neutral competition within the sub-population of all symmetrically dividing cells. We demonstrate that our model explains the experimentally observed variability of lineage sizes and derive the quantitative relationship between survival time and lineage size. We also show that our model implies the existence of a regulatory mechanism which keeps the size of the symmetrically dividing cell population constant. Our results provide quantitative insight into the clonal composition of neural organoids and how it arises. This is relevant for many applications of neural organoids, and similar processes may occur in other developing tissues both in vitro and in vivo.
Collapse
Affiliation(s)
- Florian G. Pflug
- Biological Complexity Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna, Okinawa, Japan
- Center for Integrative Bioinformatics Vienna (CIBIV), Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria
| | - Simon Haendeler
- Center for Integrative Bioinformatics Vienna (CIBIV), Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria
- Vienna Biocenter (VBC) PhD Program, a Doctoral School of the University of Vienna and the Medical University of Vienna, Vienna, Austria
| | - Christopher Esk
- Institute of Molecular Biotechnology of the Austrian Academy of Science (IMBA), Vienna BioCenter (VBC), Vienna, Austria
- Institute of Molecular Biology, University of Innsbruck, Innsbruck, Austria
| | - Dominik Lindenhofer
- Institute of Molecular Biotechnology of the Austrian Academy of Science (IMBA), Vienna BioCenter (VBC), Vienna, Austria
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Jürgen A. Knoblich
- Institute of Molecular Biotechnology of the Austrian Academy of Science (IMBA), Vienna BioCenter (VBC), Vienna, Austria
- Department of Neurology, Medical University of Vienna, Vienna, Austria
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna (CIBIV), Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria
- Faculty of Computer Science Bioinformatics and Computational Biology, University of Vienna, Vienna, Austria
| |
Collapse
|
3
|
Horvath ERB, Stein MG, Mulvey MA, Hernandez EJ, Winter JM. Resistance Gene Association and Inference Network (ReGAIN): A Bioinformatics Pipeline for Assessing Probabilistic Co-Occurrence Between Resistance Genes in Bacterial Pathogens. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582197. [PMID: 38464005 PMCID: PMC10925210 DOI: 10.1101/2024.02.26.582197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The rampant rise of multidrug resistant (MDR) bacterial pathogens poses a severe health threat, necessitating innovative tools to unravel the complex genetic underpinnings of antimicrobial resistance. Despite significant strides in developing genomic tools for detecting resistance genes, a gap remains in analyzing organism-specific patterns of resistance gene co-occurrence. Addressing this deficiency, we developed the Resistance Gene Association and Inference Network (ReGAIN), a novel web-based and command line genomic platform that uses Bayesian network structure learning to identify and map resistance gene networks in bacterial pathogens. ReGAIN not only detects resistance genes using well-established methods, but also elucidates their complex interplay, critical for understanding MDR phenotypes. Focusing on ESKAPE pathogens, ReGAIN yielded a queryable database for investigating resistance gene co-occurrence, enriching resistome analyses, and providing new insights into the dynamics of antimicrobial resistance. Furthermore, the versatility of ReGAIN extends beyond antibiotic resistance genes to include assessment of co-occurrence patterns among heavy metal resistance and virulence determinants, providing a comprehensive overview of key gene relationships impacting both disease progression and treatment outcomes.
Collapse
Affiliation(s)
- Elijah R Bring Horvath
- Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, Utah, 84112, United States
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, 84112, United States
| | - Mathew G Stein
- Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, Utah, 84112, United States
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, 84112, United States
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, United States
- Henry Eyring Center for Cell & Genome Science, University of Utah, Salt Lake City, UT 84112, United States
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, 84112, United States
| | - Matthew A Mulvey
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, United States
- Henry Eyring Center for Cell & Genome Science, University of Utah, Salt Lake City, UT 84112, United States
| | - Edgar J Hernandez
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, 84112, United States
| | - Jaclyn M Winter
- Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, Utah, 84112, United States
| |
Collapse
|
4
|
Järvenpää M, Corander J. On predictive inference for intractable models via approximate Bayesian computation. STATISTICS AND COMPUTING 2023; 33:42. [PMID: 36785730 PMCID: PMC9911513 DOI: 10.1007/s11222-022-10163-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 10/02/2022] [Indexed: 06/18/2023]
Abstract
UNLABELLED Approximate Bayesian computation (ABC) is commonly used for parameter estimation and model comparison for intractable simulator-based statistical models whose likelihood function cannot be evaluated. In this paper we instead investigate the feasibility of ABC as a generic approximate method for predictive inference, in particular, for computing the posterior predictive distribution of future observations or missing data of interest. We consider three complementary ABC approaches for this goal, each based on different assumptions regarding which predictive density of the intractable model can be sampled from. The case where only simulation from the joint density of the observed and future data given the model parameters can be used for inference is given particular attention and it is shown that the ideal summary statistic in this setting is minimal predictive sufficient instead of merely minimal sufficient (in the ordinary sense). An ABC prediction approach that takes advantage of a certain latent variable representation is also investigated. We additionally show how common ABC sampling algorithms can be used in the predictive settings considered. Our main results are first illustrated by using simple time-series models that facilitate analytical treatment, and later by using two common intractable dynamic models. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11222-022-10163-6.
Collapse
Affiliation(s)
- Marko Järvenpää
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), University of Helsinki, Helsinki, Finland
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| |
Collapse
|
5
|
Warne DJ, Baker RE, Simpson MJ. Rapid Bayesian Inference for Expensive Stochastic Models. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.2000419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- David J. Warne
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Ruth E. Baker
- Mathematical Institute, University of Oxford, Oxford, UK
| | - Matthew J. Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
6
|
Jiang RM, Wrede F, Singh P, Hellander A, Petzold LR. Accelerated regression-based summary statistics for discrete stochastic systems via approximate simulators. BMC Bioinformatics 2021; 22:339. [PMID: 34162329 PMCID: PMC8220802 DOI: 10.1186/s12859-021-04255-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 06/10/2021] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Approximate Bayesian Computation (ABC) has become a key tool for calibrating the parameters of discrete stochastic biochemical models. For higher dimensional models and data, its performance is strongly dependent on having a representative set of summary statistics. While regression-based methods have been demonstrated to allow for the automatic construction of effective summary statistics, their reliance on first simulating a large training set creates a significant overhead when applying these methods to discrete stochastic models for which simulation is relatively expensive. In this τ work, we present a method to reduce this computational burden by leveraging approximate simulators of these systems, such as ordinary differential equations and τ-Leaping approximations. RESULTS We have developed an algorithm to accelerate the construction of regression-based summary statistics for Approximate Bayesian Computation by selectively using the faster approximate algorithms for simulations. By posing the problem as one of ratio estimation, we use state-of-the-art methods in machine learning to show that, in many cases, our algorithm can significantly reduce the number of simulations from the full resolution model at a minimal cost to accuracy and little additional tuning from the user. We demonstrate the usefulness and robustness of our method with four different experiments. CONCLUSIONS We provide a novel algorithm for accelerating the construction of summary statistics for stochastic biochemical systems. Compared to the standard practice of exclusively training from exact simulator samples, our method is able to dramatically reduce the number of required calls to the stochastic simulator at a minimal loss in accuracy. This can immediately be implemented to increase the overall speed of the ABC workflow for estimating parameters in complex systems.
Collapse
Affiliation(s)
- Richard M. Jiang
- Department of Computer Science, University of California, Santa Barbara, Santa Barbara, USA
| | - Fredrik Wrede
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Prashant Singh
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Andreas Hellander
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Linda R. Petzold
- Department of Computer Science, University of California, Santa Barbara, Santa Barbara, USA
| |
Collapse
|
7
|
Browning AP, Warne DJ, Burrage K, Baker RE, Simpson MJ. Identifiability analysis for stochastic differential equation models in systems biology. J R Soc Interface 2020; 17:20200652. [PMID: 33323054 PMCID: PMC7811582 DOI: 10.1098/rsif.2020.0652] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 11/24/2020] [Indexed: 12/26/2022] Open
Abstract
Mathematical models are routinely calibrated to experimental data, with goals ranging from building predictive models to quantifying parameters that cannot be measured. Whether or not reliable parameter estimates are obtainable from the available data can easily be overlooked. Such issues of parameter identifiability have important ramifications for both the predictive power of a model, and the mechanistic insight that can be obtained. Identifiability analysis is well-established for deterministic, ordinary differential equation (ODE) models, but there are no commonly adopted methods for analysing identifiability in stochastic models. We provide an accessible introduction to identifiability analysis and demonstrate how existing ideas for analysis of ODE models can be applied to stochastic differential equation (SDE) models through four practical case studies. To assess structural identifiability, we study ODEs that describe the statistical moments of the stochastic process using open-source software tools. Using practically motivated synthetic data and Markov chain Monte Carlo methods, we assess parameter identifiability in the context of available data. Our analysis shows that SDE models can often extract more information about parameters than deterministic descriptions. All code used to perform the analysis is available on Github.
Collapse
Affiliation(s)
- Alexander P. Browning
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, Queensland University of Technology, Brisbane, Australia
| | - David J. Warne
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, Queensland University of Technology, Brisbane, Australia
| | - Kevin Burrage
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, Queensland University of Technology, Brisbane, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, Queensland University of Technology, Brisbane, Australia
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Ruth E. Baker
- Mathematical Institute, University of Oxford, Oxford, UK
| | - Matthew J. Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- ARC Centre of Excellence for Mathematical and Statistical Frontiers, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
8
|
Mikelson J, Khammash M. Likelihood-free nested sampling for parameter inference of biochemical reaction networks. PLoS Comput Biol 2020; 16:e1008264. [PMID: 33035218 PMCID: PMC7577508 DOI: 10.1371/journal.pcbi.1008264] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 10/21/2020] [Accepted: 08/16/2020] [Indexed: 12/03/2022] Open
Abstract
The development of mechanistic models of biological systems is a central part of Systems Biology. One major challenge in developing these models is the accurate inference of model parameters. In recent years, nested sampling methods have gained increased attention in the Systems Biology community due to the fact that they are parallelizable and provide error estimates with no additional computations. One drawback that severely limits the usability of these methods, however, is that they require the likelihood function to be available, and thus cannot be applied to systems with intractable likelihoods, such as stochastic models. Here we present a likelihood-free nested sampling method for parameter inference which overcomes these drawbacks. This method gives an unbiased estimator of the Bayesian evidence as well as samples from the posterior. We derive a lower bound on the estimators variance which we use to formulate a novel termination criterion for nested sampling. The presented method enables not only the reliable inference of the posterior of parameters for stochastic systems of a size and complexity that is challenging for traditional methods, but it also provides an estimate of the obtained variance. We illustrate our approach by applying it to several realistically sized models with simulated data as well as recently published biological data. We also compare our developed method with the two most popular other likeliood-free approaches: pMCMC and ABC-SMC. The C++ code of the proposed methods, together with test data, is available at the github web page https://github.com/Mijan/LFNS_paper. The behaviour of mathematical models of biochemical reactions is governed by model parameters encoding for various reaction rates, molecule concentrations and other biochemical quantities. As the general purpose of these models is to reproduce and predict the true biological response to different stimuli, the inference of these parameters, given experimental observations, is a crucial part of Systems Biology. While plenty of methods have been published for the inference of model parameters, most of them require the availability of the likelihood function and thus cannot be applied to models that do not allow for the computation of the likelihood. Further, most established methods do not provide an estimate of the variance of the obtained estimator. In this paper, we present a novel inference method that accurately approximates the posterior distribution of parameters and does not require the evaluation of the likelihood function. Our method is based on the nested sampling algorithm and approximates the likelihood with a particle filter. We show that the resulting posterior estimates are unbiased and provide a way to estimate not just the posterior distribution, but also an error estimate of the final estimator. We illustrate our method on several stochastic models with simulated data as well as one model of transcription with real biological data.
Collapse
|
9
|
Harrison JU, Baker RE. An automatic adaptive method to combine summary statistics in approximate Bayesian computation. PLoS One 2020; 15:e0236954. [PMID: 32760106 PMCID: PMC7410215 DOI: 10.1371/journal.pone.0236954] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 07/16/2020] [Indexed: 11/18/2022] Open
Abstract
To infer the parameters of mechanistic models with intractable likelihoods, techniques such as approximate Bayesian computation (ABC) are increasingly being adopted. One of the main disadvantages of ABC in practical situations, however, is that parameter inference must generally rely on summary statistics of the data. This is particularly the case for problems involving high-dimensional data, such as biological imaging experiments. However, some summary statistics contain more information about parameters of interest than others, and it is not always clear how to weight their contributions within the ABC framework. We address this problem by developing an automatic, adaptive algorithm that chooses weights for each summary statistic. Our algorithm aims to maximize the distance between the prior and the approximate posterior by automatically adapting the weights within the ABC distance function. Computationally, we use a nearest neighbour estimator of the distance between distributions. We justify the algorithm theoretically based on properties of the nearest neighbour distance estimator. To demonstrate the effectiveness of our algorithm, we apply it to a variety of test problems, including several stochastic models of biochemical reaction networks, and a spatial model of diffusion, and compare our results with existing algorithms.
Collapse
Affiliation(s)
- Jonathan U. Harrison
- Mathematical Institute, Mathematical Sciences Building, University of Warwick, Coventry, United Kingdom
- * E-mail:
| | - Ruth E. Baker
- Mathematical Institute, Andrew Wiles Building, University of Oxford, Oxford, United Kingdom
| |
Collapse
|