1
|
Babel H, Omar O, Paul A, Bär J. Reducing Structural Nonidentifiabilities in Upstream Bioprocess Models Using Profile-Likelihood. Biotechnol Bioeng 2025; 122:833-845. [PMID: 39825521 DOI: 10.1002/bit.28922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Revised: 11/28/2024] [Accepted: 12/28/2024] [Indexed: 01/20/2025]
Abstract
Process models are increasingly used to support upstream process development in the biopharmaceutical industry for process optimization, scale-up and to reduce experimental effort. Parametric unstructured models based on biological mechanisms are highly promising, since they do not require large amounts of data. The critical part in the application is the certainty of the parameter estimates, since uncertainty of the parameter estimates propagates to model predictions and can increase the risk associated with those predictions. Currently Fisher-Information-Matrix based approximations or Monte-Carlo approaches are used to estimate parameter confidence intervals and regularization approaches to decrease parameter uncertainty. Here we apply profile likelihood to determine parameter identifiability of a recent upstream process model. We have investigated the effect of data amount on identifiability and found out that addition of data reduces non-identifiability. The likelihood profiles of nonidentifiable parameters were then used to uncover structural model changes. These changes effectively alleviate the remaining non-identifiabilities except for a single parameter out of 21 total parameters. We present the first application of profile likelihood to a complete upstream process model. Profile likelihood is a highly suitable method to determine parameter confidence intervals in upstream process models and provides reliable estimates even with nonlinear models and limited data.
Collapse
Affiliation(s)
- Heiko Babel
- Boehringer Ingelheim Pharma GmbH & Co.KG, Biopharmaceuticals Germany, Biberach an der Riß, Germany
| | - Ola Omar
- Boehringer Ingelheim Pharma GmbH & Co.KG, Biopharmaceuticals Germany, Biberach an der Riß, Germany
| | - Albert Paul
- Boehringer Ingelheim Pharma GmbH & Co.KG, Biopharmaceuticals Germany, Biberach an der Riß, Germany
| | - Joachim Bär
- Boehringer Ingelheim Pharma GmbH & Co.KG, Biopharmaceuticals Germany, Biberach an der Riß, Germany
| |
Collapse
|
2
|
Raimúndez E, Fedders M, Hasenauer J. Posterior marginalization accelerates Bayesian inference for dynamical models of biological processes. iScience 2023; 26:108083. [PMID: 37867942 PMCID: PMC10589897 DOI: 10.1016/j.isci.2023.108083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/16/2023] [Accepted: 09/25/2023] [Indexed: 10/24/2023] Open
Abstract
Bayesian inference is an important method in the life and natural sciences for learning from data. It provides information about parameter and prediction uncertainties. Yet, generating representative samples from the posterior distribution is often computationally challenging. Here, we present an approach that lowers the computational complexity of sample generation for dynamical models with scaling, offset, and noise parameters. The proposed method is based on the marginalization of the posterior distribution. We provide analytical results for a broad class of problems with conjugate priors and show that the method is suitable for a large number of applications. Subsequently, we demonstrate the benefit of the approach for applications from the field of systems biology. We report an improvement up to 50 times in the effective sample size per unit of time. As the scheme is broadly applicable, it will facilitate Bayesian inference in different research fields.
Collapse
Affiliation(s)
- Elba Raimúndez
- Life and Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
- Technische Universität München, Center for Mathematics, Garching, Germany
| | - Michael Fedders
- Life and Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Jan Hasenauer
- Life and Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
- Technische Universität München, Center for Mathematics, Garching, Germany
- Helmholtz Zentrum München - German Research Center for Environmental Health, Computational Health Center, Neuherberg, Germany
| |
Collapse
|
3
|
Beck RJ, Sloot S, Matsushita H, Kakimi K, Beltman JB. Mathematical modeling identifies LAG3 and HAVCR2 as biomarkers of T cell exhaustion in melanoma. iScience 2023; 26:106666. [PMID: 37182110 PMCID: PMC10173735 DOI: 10.1016/j.isci.2023.106666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 12/15/2022] [Accepted: 04/09/2023] [Indexed: 05/16/2023] Open
Abstract
Cytotoxic T lymphocytes (CTLs) control tumors via lysis of antigen-presenting targets or through secretion of cytokines such as interferon-γ (IFNG), which inhibit tumor cell proliferation. Improved understanding of CTL interactions within solid tumors will aid the development of immunotherapeutic strategies against cancer. In this study, we take a systems biology approach to compare the importance of cytolytic versus IFNG-mediated cytostatic effects in a murine melanoma model (B16F10) and to dissect the contribution of immune checkpoints HAVCR2, LAG3, and PDCD1/CD274 to CTL exhaustion. We integrated multimodal data to inform an ordinary differential equation (ODE) model of CTL activities inside the tumor. Our model predicted that CTL cytotoxicity played only a minor role in tumor control relative to the cytostatic effects of IFNG. Furthermore, our analysis revealed that within B16F10 melanomas HAVCR2 and LAG3 better characterize the development of a dysfunctional CTL phenotype than does the PDCD1/CD274 axis.
Collapse
Affiliation(s)
- Richard J. Beck
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, the Netherlands
| | - Sander Sloot
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, the Netherlands
| | - Hirokazu Matsushita
- Translational Oncoimmunology, Aichi Cancer Center Research Institute, Nagoya, Japan
| | - Kazuhiro Kakimi
- Department of Immunotherapeutics, The University of Tokyo Hospital, Tokyo, Japan
| | - Joost B. Beltman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, the Netherlands
- Corresponding author
| |
Collapse
|
4
|
Zhang X, Su Y, Lane AN, Stromberg AJ, Fan TWM, Wang C. Bayesian kinetic modeling for tracer-based metabolomic data. BMC Bioinformatics 2023; 24:108. [PMID: 36949395 PMCID: PMC10035190 DOI: 10.1186/s12859-023-05211-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 02/24/2023] [Indexed: 03/24/2023] Open
Abstract
BACKGROUND Stable Isotope Resolved Metabolomics (SIRM) is a new biological approach that uses stable isotope tracers such as uniformly [Formula: see text]-enriched glucose ([Formula: see text]-Glc) to trace metabolic pathways or networks at the atomic level in complex biological systems. Non-steady-state kinetic modeling based on SIRM data uses sets of simultaneous ordinary differential equations (ODEs) to quantitatively characterize the dynamic behavior of metabolic networks. It has been increasingly used to understand the regulation of normal metabolism and dysregulation in the development of diseases. However, fitting a kinetic model is challenging because there are usually multiple sets of parameter values that fit the data equally well, especially for large-scale kinetic models. In addition, there is a lack of statistically rigorous methods to compare kinetic model parameters between different experimental groups. RESULTS We propose a new Bayesian statistical framework to enhance parameter estimation and hypothesis testing for non-steady-state kinetic modeling of SIRM data. For estimating kinetic model parameters, we leverage the prior distribution not only to allow incorporation of experts' knowledge but also to provide robust parameter estimation. We also introduce a shrinkage approach for borrowing information across the ensemble of metabolites to stably estimate the variance of an individual isotopomer. In addition, we use a component-wise adaptive Metropolis algorithm with delayed rejection to perform efficient Monte Carlo sampling of the posterior distribution over high-dimensional parameter space. For comparing kinetic model parameters between experimental groups, we propose a new reparameterization method that converts the complex hypothesis testing problem into a more tractable parameter estimation problem. We also propose an inference procedure based on credible interval and credible value. Our method is freely available for academic use at https://github.com/xuzhang0131/MCMCFlux . CONCLUSIONS Our new Bayesian framework provides robust estimation of kinetic model parameters and enables rigorous comparison of model parameters between experimental groups. Simulation studies and application to a lung cancer study demonstrate that our framework performs well for non-steady-state kinetic modeling of SIRM data.
Collapse
Affiliation(s)
- Xu Zhang
- Dr. Bing Zhang Department of Statistics, University of Kentucky, Lexington, 40536, USA.
| | - Ya Su
- Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, 23220, USA
| | - Andrew N Lane
- Markey Cancer Center, University of Kentucky, Lexington, 40536, USA
- Center for Environmental and Systems Biochemistry, University of Kentucky, Lexington, 40536, USA
- Department of Toxicology and Cancer Biology, University of Kentucky, Lexington, 40536, USA
| | - Arnold J Stromberg
- Dr. Bing Zhang Department of Statistics, University of Kentucky, Lexington, 40536, USA
| | - Teresa W M Fan
- Markey Cancer Center, University of Kentucky, Lexington, 40536, USA
- Center for Environmental and Systems Biochemistry, University of Kentucky, Lexington, 40536, USA
- Department of Toxicology and Cancer Biology, University of Kentucky, Lexington, 40536, USA
| | - Chi Wang
- Dr. Bing Zhang Department of Statistics, University of Kentucky, Lexington, 40536, USA.
- Markey Cancer Center, University of Kentucky, Lexington, 40536, USA.
- Division of Cancer Biostatistics, Department of Internal Medicine, University of Kentucky, Lexington, 40536, USA.
| |
Collapse
|
5
|
Systematic Bayesian posterior analysis guided by Kullback-Leibler divergence facilitates hypothesis formation. J Theor Biol 2023; 558:111341. [PMID: 36335999 DOI: 10.1016/j.jtbi.2022.111341] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 10/29/2022] [Indexed: 11/06/2022]
Abstract
Bayesian inference produces a posterior distribution for the parameters of a mathematical model that can be used to guide the formation of hypotheses; specifically, the posterior may be searched for evidence of alternative model hypotheses, which serves as a starting point for hypothesis formation and model refinement. Previous approaches to search for this evidence are largely qualitative and unsystematic; further, demonstrations of these approaches typically stop at hypothesis formation, leaving the questions they raise unanswered. Here, we introduce a Kullback-Leibler (KL) divergence-based ranking to expedite Bayesian hypothesis formation and investigate the hypotheses it generates, ultimately generating novel, biologically significant insights. Our approach uses KL divergence to rank parameters by how much information they gain from experimental data. Subsequently, rather than searching all model parameters at random, we use this ranking to prioritize examining the posteriors of the parameters that gained the most information from the data for evidence of alternative model hypotheses. We test our approach with two examples, which showcase the ability of our approach to systematically uncover different types of alternative hypothesis evidence. First, we test our KL divergence ranking on an established example of Bayesian hypothesis formation. Our top-ranked parameter matches the one previously identified to produce alternative hypotheses. In the second example, we apply our ranking in a novel study of a computational model of prolactin-induced JAK2-STAT5 signaling, a pathway that mediates beta cell proliferation. Within the top 3 ranked parameters (out of 33), we find a bimodal posterior revealing two possible ranges for the prolactin receptor degradation rate. We go on to refine the model, incorporating new data and determining which degradation rate is most plausible. Overall, while the effectiveness of our approach depends on having a properly formulated prior and on the form of the posterior distribution, we demonstrate that our approach offers a novel and generalizable quantitative framework for Bayesian hypothesis formation and use it to produce a novel, biologically-significant insight into beta cell signaling.
Collapse
|
6
|
Argus F, Zhao D, Babarenda Gamage TP, Nash MP, Maso Talou GD. Automated model calibration with parallel MCMC: Applications for a cardiovascular system model. Front Physiol 2022; 13:1018134. [PMID: 36439250 PMCID: PMC9683692 DOI: 10.3389/fphys.2022.1018134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 10/24/2022] [Indexed: 11/10/2022] Open
Abstract
Computational physiological models continue to increase in complexity, however, the task of efficiently calibrating the model to available clinical data remains a significant challenge. One part of this challenge is associated with long calibration times, which present a barrier for the routine application of model-based prediction in clinical practice. Another aspect of this challenge is the limited available data for the unique calibration of complex models. Therefore, to calibrate a patient-specific model, it may be beneficial to verify that task-specific model predictions have acceptable uncertainty, rather than requiring all parameters to be uniquely identified. We have developed a pipeline that reduces the set of fitting parameters to make them structurally identifiable and to improve the efficiency of a subsequent Markov Chain Monte Carlo (MCMC) analysis. MCMC was used to find the optimal parameter values and to determine the confidence interval of a task-specific prediction. This approach was demonstrated on numerical experiments where a lumped parameter model of the cardiovascular system was calibrated to brachial artery cuff pressure, echocardiogram volume measurements, and synthetic cerebral blood flow data that approximates what can be obtained from 4D-flow MRI data. This pipeline provides a cerebral arterial pressure prediction that may be useful for determining the risk of hemorrhagic stroke. For a set of three patients, this pipeline successfully reduced the parameter set of a cardiovascular system model from 12 parameters to 8–10 structurally identifiable parameters. This enabled a significant (>4×) efficiency improvement in determining confidence intervals on predictions of pressure compared to performing a naive MCMC analysis with the full parameter set. This demonstrates the potential that the proposed pipeline has in helping address one of the key challenges preventing clinical application of such models. Additionally, for each patient, the MCMC approach yielded a 95% confidence interval on systolic blood pressure prediction in the middle cerebral artery smaller than ±10 mmHg (±1.3 kPa). The proposed pipeline exploits available high-performance computing parallelism to allow straightforward automation for general models and arbitrary data sets, enabling automated calibration of a parameter set that is specific to the available clinical data with minimal user interaction.
Collapse
Affiliation(s)
- Finbar Argus
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
- *Correspondence: Finbar Argus,
| | - Debbie Zhao
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | | | - Martyn P. Nash
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
- Department of Engineering Science, University of Auckland, Auckland, New Zealand
| | | |
Collapse
|
7
|
Villaverde AF, Pathirana D, Fröhlich F, Hasenauer J, Banga JR. A protocol for dynamic model calibration. Brief Bioinform 2022; 23:bbab387. [PMID: 34619769 PMCID: PMC8769694 DOI: 10.1093/bib/bbab387] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 08/06/2021] [Accepted: 08/29/2021] [Indexed: 12/23/2022] Open
Abstract
Ordinary differential equation models are nowadays widely used for the mechanistic description of biological processes and their temporal evolution. These models typically have many unknown and nonmeasurable parameters, which have to be determined by fitting the model to experimental data. In order to perform this task, known as parameter estimation or model calibration, the modeller faces challenges such as poor parameter identifiability, lack of sufficiently informative experimental data and the existence of local minima in the objective function landscape. These issues tend to worsen with larger model sizes, increasing the computational complexity and the number of unknown parameters. An incorrectly calibrated model is problematic because it may result in inaccurate predictions and misleading conclusions. For nonexpert users, there are a large number of potential pitfalls. Here, we provide a protocol that guides the user through all the steps involved in the calibration of dynamic models. We illustrate the methodology with two models and provide all the code required to reproduce the results and perform the same analysis on new models. Our protocol provides practitioners and researchers in biological modelling with a one-stop guide that is at the same time compact and sufficiently comprehensive to cover all aspects of the problem.
Collapse
Affiliation(s)
- Alejandro F Villaverde
- Universidade de Vigo, Department of Systems Engineering & Control, Vigo 36310, Galicia, Spain
| | - Dilan Pathirana
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn 53115, Germany
| | - Fabian Fröhlich
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Jan Hasenauer
- Center for Mathematics, Technische Universität München, Garching 85748, Germany
- Harvard Medical School, Cambridge, MA 02115, USA
| | - Julio R Banga
- Bioprocess Engineering Group, IIM-CSIC, Vigo 36208, Galicia, Spain
| |
Collapse
|
8
|
Yuan B, Shen C, Luna A, Korkut A, Marks DS, Ingraham J, Sander C. CellBox: Interpretable Machine Learning for Perturbation Biology with Application to the Design of Cancer Combination Therapy. Cell Syst 2020; 12:128-140.e4. [PMID: 33373583 DOI: 10.1016/j.cels.2020.11.013] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 07/13/2020] [Accepted: 11/25/2020] [Indexed: 01/13/2023]
Abstract
Systematic perturbation of cells followed by comprehensive measurements of molecular and phenotypic responses provides informative data resources for constructing computational models of cell biology. Models that generalize well beyond training data can be used to identify combinatorial perturbations of potential therapeutic interest. Major challenges for machine learning on large biological datasets are to find global optima in a complex multidimensional space and mechanistically interpret the solutions. To address these challenges, we introduce a hybrid approach that combines explicit mathematical models of cell dynamics with a machine-learning framework, implemented in TensorFlow. We tested the modeling framework on a perturbation-response dataset of a melanoma cell line after drug treatments. The models can be efficiently trained to describe cellular behavior accurately. Even though completely data driven and independent of prior knowledge, the resulting de novo network models recapitulate some known interactions. The approach is readily applicable to various kinetic models of cell biology. A record of this paper's Transparent Peer Review process is included in the Supplemental Information.
Collapse
Affiliation(s)
- Bo Yuan
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA; cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute, Cambridge, MA, USA.
| | - Ciyue Shen
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA; cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute, Cambridge, MA, USA.
| | - Augustin Luna
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA; cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute, Cambridge, MA, USA
| | - Anil Korkut
- Department of Bioinformatics & Computational Biology, the University of Texas M D Anderson Cancer Center, Houston, TX, USA
| | - Debora S Marks
- Broad Institute, Cambridge, MA, USA; Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - John Ingraham
- MIT Computer Science & Artificial Intelligence Laboratory, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA; cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
9
|
Hass H, Loos C, Raimúndez-Álvarez E, Timmer J, Hasenauer J, Kreutz C. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics 2020; 35:3073-3082. [PMID: 30624608 PMCID: PMC6735869 DOI: 10.1093/bioinformatics/btz020] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 11/19/2018] [Accepted: 01/06/2019] [Indexed: 12/19/2022] Open
Abstract
Motivation Dynamic models are used in systems biology to study and understand cellular processes like gene regulation or signal transduction. Frequently, ordinary differential equation (ODE) models are used to model the time and dose dependency of the abundances of molecular compounds as well as interactions and translocations. A multitude of computational approaches, e.g. for parameter estimation or uncertainty analysis have been developed within recent years. However, many of these approaches lack proper testing in application settings because a comprehensive set of benchmark problems is yet missing. Results We present a collection of 20 benchmark problems in order to evaluate new and existing methodologies, where an ODE model with corresponding experimental data is referred to as problem. In addition to the equations of the dynamical system, the benchmark collection provides observation functions as well as assumptions about measurement noise distributions and parameters. The presented benchmark models comprise problems of different size, complexity and numerical demands. Important characteristics of the models and methodological requirements are summarized, estimated parameters are provided, and some example studies were performed for illustrating the capabilities of the presented benchmark collection. Availability and implementation The models are provided in several standardized formats, including an easy-to-use human readable form and machine-readable SBML files. The data is provided as Excel sheets. All files are available at https://github.com/Benchmarking-Initiative/Benchmark-Models, including step-by-step explanations and MATLAB code to process and simulate the models. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Helge Hass
- Center for Systems Biology (ZBSA), University of Freiburg, Freiburg 79104, Germany.,Institute of Physics, University of Freiburg, Freiburg 79104, Germany
| | - Carolin Loos
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg 85764, Germany.,Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching 85748, Germany
| | - Elba Raimúndez-Álvarez
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg 85764, Germany.,Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching 85748, Germany
| | - Jens Timmer
- Center for Systems Biology (ZBSA), University of Freiburg, Freiburg 79104, Germany.,Institute of Physics, University of Freiburg, Freiburg 79104, Germany.,Center for Data Analysis and Modelling (FDM), University of Freiburg, Freiburg 79104, Germany.,BIOSS Centre for Biological Signalling Studies, University of Freiburg, Freiburg 79104, Germany
| | - Jan Hasenauer
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg 85764, Germany.,Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching 85748, Germany
| | - Clemens Kreutz
- Center for Systems Biology (ZBSA), University of Freiburg, Freiburg 79104, Germany.,Institute of Physics, University of Freiburg, Freiburg 79104, Germany.,Center for Data Analysis and Modelling (FDM), University of Freiburg, Freiburg 79104, Germany
| |
Collapse
|
10
|
Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference. PLoS One 2020; 15:e0230101. [PMID: 32168343 PMCID: PMC7069631 DOI: 10.1371/journal.pone.0230101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 02/21/2020] [Indexed: 11/19/2022] Open
Abstract
An important feature of Bayesian statistics is the opportunity to do sequential inference: the posterior distribution obtained after seeing a dataset can be used as prior for a second inference. However, when Monte Carlo sampling methods are used for inference, we only have a set of samples from the posterior distribution. To do sequential inference, we then either have to evaluate the second posterior at only these locations and reweight the samples accordingly, or we can estimate a functional description of the posterior probability distribution from the samples and use that as prior for the second inference. Here, we investigated to what extent we can obtain an accurate joint posterior from two datasets if the inference is done sequentially rather than jointly, under the condition that each inference step is done using Monte Carlo sampling. To test this, we evaluated the accuracy of kernel density estimates, Gaussian mixtures, mixtures of factor analyzers, vine copulas and Gaussian processes in approximating posterior distributions, and then tested whether these approximations can be used in sequential inference. In low dimensionality, Gaussian processes are more accurate, whereas in higher dimensionality Gaussian mixtures, mixtures of factor analyzers or vine copulas perform better. In our test cases of sequential inference, using posterior approximations gives more accurate results than direct sample reweighting, but joint inference is still preferable over sequential inference whenever possible. Since the performance is case-specific, we provide an R package mvdens with a unified interface for the density approximation methods.
Collapse
|
11
|
Zubair A, Rosen IG, Nuzhdin SV, Marjoram P. Bayesian model selection for the Drosophila gap gene network. BMC Bioinformatics 2019; 20:327. [PMID: 31195954 PMCID: PMC6567646 DOI: 10.1186/s12859-019-2888-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 05/09/2019] [Indexed: 11/10/2022] Open
Abstract
Background The gap gene system controls the early cascade of the segmentation pathway in Drosophila melanogaster as well as other insects. Owing to its tractability and key role in embryo patterning, this system has been the focus for both computational modelers and experimentalists. The gap gene expression dynamics can be considered strictly as a one-dimensional process and modeled as a system of reaction-diffusion equations. While substantial progress has been made in modeling this phenomenon, there still remains a deficit of approaches to evaluate competing hypotheses. Most of the model development has happened in isolation and there has been little attempt to compare candidate models. Results The Bayesian framework offers a means of doing formal model evaluation. Here, we demonstrate how this framework can be used to compare different models of gene expression. We focus on the Papatsenko-Levine formalism, which exploits a fractional occupancy based approach to incorporate activation of the gap genes by the maternal genes and cross-regulation by the gap genes themselves. The Bayesian approach provides insight about relationship between system parameters. In the regulatory pathway of segmentation, the parameters for number of binding sites and binding affinity have a negative correlation. The model selection analysis supports a stronger binding affinity for Bicoid compared to other regulatory edges, as shown by a larger posterior mean. The procedure doesn’t show support for activation of Kruppel by Bicoid. Conclusions We provide an efficient solver for the general representation of the Papatsenko-Levine model. We also demonstrate the utility of Bayes factor for evaluating candidate models for spatial pattering models. In addition, by using the parallel tempering sampler, the convergence of Markov chains can be remarkably improved and robust estimates of Bayes factors obtained. Electronic supplementary material The online version of this article (10.1186/s12859-019-2888-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Asif Zubair
- Molecular and Computational Biology, USC, 1050 Childs Way, Los Angeles, CA 90089-2532, US.
| | - I Gary Rosen
- Department of Mathematics, USC, 3620 S. Vermont Ave., Los Angeles, CA 90089-2532, US
| | - Sergey V Nuzhdin
- Molecular and Computational Biology, USC, 1050 Childs Way, Los Angeles, CA 90089-2532, US
| | - Paul Marjoram
- Molecular and Computational Biology, USC, 1050 Childs Way, Los Angeles, CA 90089-2532, US
| |
Collapse
|
12
|
Cao Z, Grima R. Accuracy of parameter estimation for auto-regulatory transcriptional feedback loops from noisy data. J R Soc Interface 2019; 16:20180967. [PMID: 30940028 PMCID: PMC6505555 DOI: 10.1098/rsif.2018.0967] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Bayesian and non-Bayesian moment-based inference methods are commonly used to estimate the parameters defining stochastic models of gene regulatory networks from noisy single cell or population snapshot data. However, a systematic investigation of the accuracy of the predictions of these methods remains missing. Here, we present the results of such a study using synthetic noisy data of a negative auto-regulatory transcriptional feedback loop, one of the most common building blocks of complex gene regulatory networks. We study the error in parameter estimation as a function of (i) number of cells in each sample; (ii) the number of time points; (iii) the highest-order moment of protein fluctuations used for inference; (iv) the moment-closure method used for likelihood approximation. We find that for sample sizes typical of flow cytometry experiments, parameter estimation by maximizing the likelihood is as accurate as using Bayesian methods but with a much reduced computational time. We also show that the choice of moment-closure method is the crucial factor determining the maximum achievable accuracy of moment-based inference methods. Common likelihood approximation methods based on the linear noise approximation or the zero cumulants closure perform poorly for feedback loops with large protein-DNA binding rates or large protein bursts; this is exacerbated for highly heterogeneous cell populations. By contrast, approximating the likelihood using the linear-mapping approximation or conditional derivative matching leads to highly accurate parameter estimates for a wide range of conditions.
Collapse
|
13
|
Stapor P, Weindl D, Ballnus B, Hug S, Loos C, Fiedler A, Krause S, Hroß S, Fröhlich F, Hasenauer J. PESTO: Parameter EStimation TOolbox. Bioinformatics 2019; 34:705-707. [PMID: 29069312 PMCID: PMC5860618 DOI: 10.1093/bioinformatics/btx676] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 10/20/2017] [Indexed: 11/15/2022] Open
Abstract
Summary PESTO is a widely applicable and highly customizable toolbox for parameter estimation in MathWorks MATLAB. It offers scalable algorithms for optimization, uncertainty and identifiability analysis, which work in a very generic manner, treating the objective function as a black box. Hence, PESTO can be used for any parameter estimation problem, for which the user can provide a deterministic objective function in MATLAB. Availability and implementation PESTO is a MATLAB toolbox, freely available under the BSD license. The source code, along with extensive documentation and example code, can be downloaded from https://github.com/ICB-DCM/PESTO/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paul Stapor
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Daniel Weindl
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Benjamin Ballnus
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Sabine Hug
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Carolin Loos
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Anna Fiedler
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Sabrina Krause
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Sabrina Hroß
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Fabian Fröhlich
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Jan Hasenauer
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany.,Center for Mathematics, Technische Universität München, 85748 Garching, Germany
| |
Collapse
|
14
|
An energetic reformulation of kinetic rate laws enables scalable parameter estimation for biochemical networks. J Theor Biol 2019; 461:145-156. [DOI: 10.1016/j.jtbi.2018.10.041] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 09/20/2018] [Accepted: 10/19/2018] [Indexed: 11/18/2022]
|
15
|
Efficient Parameter Estimation Enables the Prediction of Drug Response Using a Mechanistic Pan-Cancer Pathway Model. Cell Syst 2018; 7:567-579.e6. [DOI: 10.1016/j.cels.2018.10.013] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 09/07/2018] [Accepted: 10/29/2018] [Indexed: 12/25/2022]
|
16
|
Meng Y, Cai XH, Wang L. Potential Genes and Pathways of Neonatal Sepsis Based on Functional Gene Set Enrichment Analyses. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2018; 2018:6708520. [PMID: 30154914 PMCID: PMC6091373 DOI: 10.1155/2018/6708520] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/04/2018] [Accepted: 06/27/2018] [Indexed: 12/16/2022]
Abstract
BACKGROUND Neonatal sepsis (NS) is considered as the most common cause of neonatal deaths that newborns suffer from. Although numerous studies focus on gene biomarkers of NS, the predictive value of the gene biomarkers is low. NS pathogenesis is still needed to be investigated. METHODS After data preprocessing, we used KEGG enrichment method to identify the differentially expressed pathways between NS and normal controls. Then, functional principal component analysis (FPCA) was adopted to calculate gene values in NS. In order to further study the key signaling pathway of the NS, elastic-net regression model, Mann-Whitney U test, and coexpression network were used to estimate the weights of signaling pathway and hub genes. RESULTS A total of 115 different pathways between NS and controls were first identified. FPCA made full use of time-series gene expression information and estimated F values of genes in the different pathways. The top 1000 genes were considered as the different genes and were further analyzed by elastic-net regression and MWU test. There were 7 key signaling pathways between the NS and controls, according to different sources. Among those genes involved in key pathways, 7 hub genes, PIK3CA, TGFBR2, CDKN1B, KRAS, E2F3, TRAF6, and CHUK, were determined based on the coexpression network. Most of them were cancer-related genes. PIK3CA was considered as the common marker, which is highly expressed in the lymphocyte group. Little was known about the correlation of PIK3CA with NS, which gives us a new enlightenment for NS study. CONCLUSION This research might provide the perspective information to explore the potential novel genes and pathways as NS therapy targets.
Collapse
Affiliation(s)
- YuXiu Meng
- Department of Neonatology, First People's Hospital of Jining, Jining, Shandong 272000, China
| | - Xue Hong Cai
- Department of Pediatrics, Traditional Chinese Medicine Hospital of Yanzhou, Jining, Shandong 272100, China
| | - LiPei Wang
- Department of Neonatology, First People's Hospital of Jining, Jining, Shandong 272000, China
| |
Collapse
|
17
|
Ballnus B, Schaper S, Theis FJ, Hasenauer J. Bayesian parameter estimation for biochemical reaction networks using region-based adaptive parallel tempering. Bioinformatics 2018; 34:i494-i501. [PMID: 29949983 PMCID: PMC6022572 DOI: 10.1093/bioinformatics/bty229] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Motivation Mathematical models have become standard tools for the investigation of cellular processes and the unraveling of signal processing mechanisms. The parameters of these models are usually derived from the available data using optimization and sampling methods. However, the efficiency of these methods is limited by the properties of the mathematical model, e.g. non-identifiabilities, and the resulting posterior distribution. In particular, multi-modal distributions with long valleys or pronounced tails are difficult to optimize and sample. Thus, the developement or improvement of optimization and sampling methods is subject to ongoing research. Results We suggest a region-based adaptive parallel tempering algorithm which adapts to the problem-specific posterior distributions, i.e. modes and valleys. The algorithm combines several established algorithms to overcome their individual shortcomings and to improve sampling efficiency. We assessed its properties for established benchmark problems and two ordinary differential equation models of biochemical reaction networks. The proposed algorithm outperformed state-of-the-art methods in terms of calculation efficiency and mixing. Since the algorithm does not rely on a specific problem structure, but adapts to the posterior distribution, it is suitable for a variety of model classes. Availability and implementation The code is available both as Supplementary Material and in a Git repository written in MATLAB. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin Ballnus
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | - Steffen Schaper
- Bayer AG, Engineering and Technologies, Applied Mathematics, Leverkusen, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | - Jan Hasenauer
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| |
Collapse
|
18
|
Thijssen B, Dijkstra TMH, Heskes T, Wessels LFA. Bayesian data integration for quantifying the contribution of diverse measurements to parameter estimates. Bioinformatics 2018; 34:803-811. [PMID: 29069283 PMCID: PMC6192208 DOI: 10.1093/bioinformatics/btx666] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 08/03/2017] [Accepted: 10/23/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Computational models in biology are frequently underdetermined, due to limits in our capacity to measure biological systems. In particular, mechanistic models often contain parameters whose values are not constrained by a single type of measurement. It may be possible to achieve better model determination by combining the information contained in different types of measurements. Bayesian statistics provides a convenient framework for this, allowing a quantification of the reduction in uncertainty with each additional measurement type. We wished to explore whether such integration is feasible and whether it can allow computational models to be more accurately determined. Results We created an ordinary differential equation model of cell cycle regulation in budding yeast and integrated data from 13 different studies covering different experimental techniques. We found that for some parameters, a single type of measurement, relative time course mRNA expression, is sufficient to constrain them. Other parameters, however, were only constrained when two types of measurements were combined, namely relative time course and absolute transcript concentration. Comparing the estimates to measurements from three additional, independent studies, we found that the degradation and transcription rates indeed matched the model predictions in order of magnitude. The predicted translation rate was incorrect however, thus revealing a deficiency in the model. Since this parameter was not constrained by any of the measurement types separately, it was only possible to falsify the model when integrating multiple types of measurements. In conclusion, this study shows that integrating multiple measurement types can allow models to be more accurately determined. Availability and implementation The models and files required for running the inference are included in the Supplementary information. Contact l.wessels@nki.nl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bram Thijssen
- Computational Cancer Biology, Division of Molecular Carcinogenesis,
Netherlands Cancer Institute, CX, Amsterdam, The Netherlands
| | - Tjeerd M H Dijkstra
- Department of Protein Evolution, Max Planck Institute for Developmental
Biology, Tübingen, Germany
- Centre for Integrative Neuroscience, University Clinic Tübingen,
Tübingen, Germany
| | - Tom Heskes
- Institute for Computing and Information Sciences, Radboud University
Nijmegen, Nijmegen GL, The Netherlands
| | - Lodewyk F A Wessels
- Computational Cancer Biology, Division of Molecular Carcinogenesis,
Netherlands Cancer Institute, CX, Amsterdam, The Netherlands
- Faculty of EEMCS, Delft University of Technology, Delft, CD, The
Netherlands
| |
Collapse
|
19
|
Formulation, construction and analysis of kinetic models of metabolism: A review of modelling frameworks. Biotechnol Adv 2017; 35:981-1003. [PMID: 28916392 DOI: 10.1016/j.biotechadv.2017.09.005] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 08/30/2017] [Accepted: 09/10/2017] [Indexed: 12/13/2022]
Abstract
Kinetic models are critical to predict the dynamic behaviour of metabolic networks. Mechanistic kinetic models for large networks remain uncommon due to the difficulty of fitting their parameters. Recent modelling frameworks promise new ways to overcome this obstacle while retaining predictive capabilities. In this review, we present an overview of the relevant mathematical frameworks for kinetic formulation, construction and analysis. Starting with kinetic formalisms, we next review statistical methods for parameter inference, as well as recent computational frameworks applied to the construction and analysis of kinetic models. Finally, we discuss opportunities and limitations hindering the development of larger kinetic reconstructions.
Collapse
|
20
|
Babtie AC, Stumpf MPH. How to deal with parameters for whole-cell modelling. J R Soc Interface 2017; 14:20170237. [PMID: 28768879 PMCID: PMC5582120 DOI: 10.1098/rsif.2017.0237] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 06/22/2017] [Indexed: 11/12/2022] Open
Abstract
Dynamical systems describing whole cells are on the verge of becoming a reality. But as models of reality, they are only useful if we have realistic parameters for the molecular reaction rates and cell physiological processes. There is currently no suitable framework to reliably estimate hundreds, let alone thousands, of reaction rate parameters. Here, we map out the relative weaknesses and promises of different approaches aimed at redressing this issue. While suitable procedures for estimation or inference of the whole (vast) set of parameters will, in all likelihood, remain elusive, some hope can be drawn from the fact that much of the cellular behaviour may be explained in terms of smaller sets of parameters. Identifying such parameter sets and assessing their behaviour is now becoming possible even for very large systems of equations, and we expect such methods to become central tools in the development and analysis of whole-cell models.
Collapse
Affiliation(s)
- Ann C Babtie
- Department of Life Sciences, Imperial College London, London, UK
| | | |
Collapse
|
21
|
Ballnus B, Hug S, Hatz K, Görlitz L, Hasenauer J, Theis FJ. Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems. BMC SYSTEMS BIOLOGY 2017; 11:63. [PMID: 28646868 PMCID: PMC5482939 DOI: 10.1186/s12918-017-0433-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 05/10/2017] [Indexed: 11/12/2022]
Abstract
BACKGROUND In quantitative biology, mathematical models are used to describe and analyze biological processes. The parameters of these models are usually unknown and need to be estimated from experimental data using statistical methods. In particular, Markov chain Monte Carlo (MCMC) methods have become increasingly popular as they allow for a rigorous analysis of parameter and prediction uncertainties without the need for assuming parameter identifiability or removing non-identifiable parameters. A broad spectrum of MCMC algorithms have been proposed, including single- and multi-chain approaches. However, selecting and tuning sampling algorithms suited for a given problem remains challenging and a comprehensive comparison of different methods is so far not available. RESULTS We present the results of a thorough benchmarking of state-of-the-art single- and multi-chain sampling methods, including Adaptive Metropolis, Delayed Rejection Adaptive Metropolis, Metropolis adjusted Langevin algorithm, Parallel Tempering and Parallel Hierarchical Sampling. Different initialization and adaptation schemes are considered. To ensure a comprehensive and fair comparison, we consider problems with a range of features such as bifurcations, periodical orbits, multistability of steady-state solutions and chaotic regimes. These problem properties give rise to various posterior distributions including uni- and multi-modal distributions and non-normally distributed mode tails. For an objective comparison, we developed a pipeline for the semi-automatic comparison of sampling results. CONCLUSION The comparison of MCMC algorithms, initialization and adaptation schemes revealed that overall multi-chain algorithms perform better than single-chain algorithms. In some cases this performance can be further increased by using a preceding multi-start local optimization scheme. These results can inform the selection of sampling methods and the benchmark collection can serve for the evaluation of new algorithms. Furthermore, our results confirm the need to address exploration quality of MCMC chains before applying the commonly used quality measure of effective sample size to prevent false analysis conclusions.
Collapse
Affiliation(s)
- Benjamin Ballnus
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstraße 1, Neuherberg, 85764 Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Boltzmannstraße 15, Garching, 85748 Germany
| | - Sabine Hug
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstraße 1, Neuherberg, 85764 Germany
| | - Kathrin Hatz
- Bayer AG, Engineering & Technologies, Applied Mathematics, Kaiser-Wilhelm-Allee, Leverkusen, 51368 Germany
| | - Linus Görlitz
- Bayer AG, Engineering & Technologies, Applied Mathematics, Kaiser-Wilhelm-Allee, Leverkusen, 51368 Germany
| | - Jan Hasenauer
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstraße 1, Neuherberg, 85764 Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Boltzmannstraße 15, Garching, 85748 Germany
| | - Fabian J. Theis
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstraße 1, Neuherberg, 85764 Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Boltzmannstraße 15, Garching, 85748 Germany
| |
Collapse
|
22
|
Fröhlich F, Kaltenbacher B, Theis FJ, Hasenauer J. Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks. PLoS Comput Biol 2017; 13:e1005331. [PMID: 28114351 PMCID: PMC5256869 DOI: 10.1371/journal.pcbi.1005331] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 12/20/2016] [Indexed: 01/06/2023] Open
Abstract
Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics. In this manuscript, we introduce a scalable method for parameter estimation for genome-scale biochemical reaction networks. Mechanistic models for genome-scale biochemical reaction networks describe the behavior of thousands of chemical species using thousands of parameters. Standard methods for parameter estimation are usually computationally intractable at these scales. Adjoint sensitivity based approaches have been suggested to have superior scalability but any rigorous evaluation is lacking. We implement a toolbox for adjoint sensitivity analysis for biochemical reaction network which also supports the import of SBML models. We show by means of a set of benchmark models that adjoint sensitivity based approaches unequivocally outperform standard approaches for large-scale models and that the achieved speedup increases with respect to both the number of parameters and the number of chemical species in the model. This demonstrates the applicability of adjoint sensitivity based approaches to parameter estimation for genome-scale mechanistic model. The MATLAB toolbox implementing the developed methods is available from http://ICB-DCM.github.io/AMICI/.
Collapse
Affiliation(s)
- Fabian Fröhlich
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | | | - Fabian J. Theis
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | - Jan Hasenauer
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
- * E-mail:
| |
Collapse
|
23
|
Parallelization and High-Performance Computing Enables Automated Statistical Inference of Multi-scale Models. Cell Syst 2017; 4:194-206.e9. [PMID: 28089542 DOI: 10.1016/j.cels.2016.12.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 09/14/2016] [Accepted: 11/30/2016] [Indexed: 01/18/2023]
Abstract
Mechanistic understanding of multi-scale biological processes, such as cell proliferation in a changing biological tissue, is readily facilitated by computational models. While tools exist to construct and simulate multi-scale models, the statistical inference of the unknown model parameters remains an open problem. Here, we present and benchmark a parallel approximate Bayesian computation sequential Monte Carlo (pABC SMC) algorithm, tailored for high-performance computing clusters. pABC SMC is fully automated and returns reliable parameter estimates and confidence intervals. By running the pABC SMC algorithm for ∼106 hr, we parameterize multi-scale models that accurately describe quantitative growth curves and histological data obtained in vivo from individual tumor spheroid growth in media droplets. The models capture the hybrid deterministic-stochastic behaviors of 105-106 of cells growing in a 3D dynamically changing nutrient environment. The pABC SMC algorithm reliably converges to a consistent set of parameters. Our study demonstrates a proof of principle for robust, data-driven modeling of multi-scale biological systems and the feasibility of multi-scale model parameterization through statistical inference.
Collapse
|
24
|
Fiedler A, Raeth S, Theis FJ, Hausser A, Hasenauer J. Tailored parameter optimization methods for ordinary differential equation models with steady-state constraints. BMC SYSTEMS BIOLOGY 2016; 10:80. [PMID: 27549154 PMCID: PMC4994295 DOI: 10.1186/s12918-016-0319-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 07/12/2016] [Indexed: 12/21/2022]
Abstract
BACKGROUND Ordinary differential equation (ODE) models are widely used to describe (bio-)chemical and biological processes. To enhance the predictive power of these models, their unknown parameters are estimated from experimental data. These experimental data are mostly collected in perturbation experiments, in which the processes are pushed out of steady state by applying a stimulus. The information that the initial condition is a steady state of the unperturbed process provides valuable information, as it restricts the dynamics of the process and thereby the parameters. However, implementing steady-state constraints in the optimization often results in convergence problems. RESULTS In this manuscript, we propose two new methods for solving optimization problems with steady-state constraints. The first method exploits ideas from optimization algorithms on manifolds and introduces a retraction operator, essentially reducing the dimension of the optimization problem. The second method is based on the continuous analogue of the optimization problem. This continuous analogue is an ODE whose equilibrium points are the optima of the constrained optimization problem. This equivalence enables the use of adaptive numerical methods for solving optimization problems with steady-state constraints. Both methods are tailored to the problem structure and exploit the local geometry of the steady-state manifold and its stability properties. A parameterization of the steady-state manifold is not required. The efficiency and reliability of the proposed methods is evaluated using one toy example and two applications. The first application example uses published data while the second uses a novel dataset for Raf/MEK/ERK signaling. The proposed methods demonstrated better convergence properties than state-of-the-art methods employed in systems and computational biology. Furthermore, the average computation time per converged start is significantly lower. In addition to the theoretical results, the analysis of the dataset for Raf/MEK/ERK signaling provides novel biological insights regarding the existence of feedback regulation. CONCLUSION Many optimization problems considered in systems and computational biology are subject to steady-state constraints. While most optimization methods have convergence problems if these steady-state constraints are highly nonlinear, the methods presented recover the convergence properties of optimizers which can exploit an analytical expression for the parameter-dependent steady state. This renders them an excellent alternative to methods which are currently employed in systems and computational biology.
Collapse
Affiliation(s)
- Anna Fiedler
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, Neuherberg, 85764 Germany
- Chair of Mathematical Modeling of Biological Systems, Center for Mathematics, Technische Universität München, Boltzmannstraße 3, Garching, 85748 Germany
| | - Sebastian Raeth
- Stuttgart Research Center Systems Biology (SRCSB), University of Stuttgart, Nobelstr. 15, Stuttgart, 70569 Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, Neuherberg, 85764 Germany
- Chair of Mathematical Modeling of Biological Systems, Center for Mathematics, Technische Universität München, Boltzmannstraße 3, Garching, 85748 Germany
| | - Angelika Hausser
- Stuttgart Research Center Systems Biology (SRCSB), University of Stuttgart, Nobelstr. 15, Stuttgart, 70569 Germany
| | - Jan Hasenauer
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, Neuherberg, 85764 Germany
- Chair of Mathematical Modeling of Biological Systems, Center for Mathematics, Technische Universität München, Boltzmannstraße 3, Garching, 85748 Germany
| |
Collapse
|
25
|
Fröhlich F, Thomas P, Kazeroonian A, Theis FJ, Grima R, Hasenauer J. Inference for Stochastic Chemical Kinetics Using Moment Equations and System Size Expansion. PLoS Comput Biol 2016; 12:e1005030. [PMID: 27447730 PMCID: PMC4957800 DOI: 10.1371/journal.pcbi.1005030] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 06/23/2016] [Indexed: 11/18/2022] Open
Abstract
Quantitative mechanistic models are valuable tools for disentangling biochemical pathways and for achieving a comprehensive understanding of biological systems. However, to be quantitative the parameters of these models have to be estimated from experimental data. In the presence of significant stochastic fluctuations this is a challenging task as stochastic simulations are usually too time-consuming and a macroscopic description using reaction rate equations (RREs) is no longer accurate. In this manuscript, we therefore consider moment-closure approximation (MA) and the system size expansion (SSE), which approximate the statistical moments of stochastic processes and tend to be more precise than macroscopic descriptions. We introduce gradient-based parameter optimization methods and uncertainty analysis methods for MA and SSE. Efficiency and reliability of the methods are assessed using simulation examples as well as by an application to data for Epo-induced JAK/STAT signaling. The application revealed that even if merely population-average data are available, MA and SSE improve parameter identifiability in comparison to RRE. Furthermore, the simulation examples revealed that the resulting estimates are more reliable for an intermediate volume regime. In this regime the estimation error is reduced and we propose methods to determine the regime boundaries. These results illustrate that inference using MA and SSE is feasible and possesses a high sensitivity. In this manuscript, we introduce efficient methods for parameter estimation for stochastic processes. The stochasticity of chemical reactions can influence the average behavior of the considered system. For some biological systems, a microscopic, stochastic description is computationally intractable but a macroscopic, deterministic description too inaccurate. This inaccuracy manifests itself in an error in parameter estimates, which impede the predictive power of the proposed model. Until now, no rigorous analysis on the magnitude of the estimation error exists. We show by means of two simulation examples that using mesoscopic descriptions based on the system size expansions and moment-closure approximations can reduce this estimation error compared to inference using a macroscopic description. This reduction is most pronounced in an intermediate volume regime where the influence of stochasticity on the average behavior is moderately strong. For the JAK/STAT pathway where experimental data is available, we show that one parameter that was not structurally identifiable when using a macroscopic description becomes structurally identifiable when using a mesoscopic description for parameter estimation.
Collapse
Affiliation(s)
- Fabian Fröhlich
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | - Philipp Thomas
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - Atefeh Kazeroonian
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | - Fabian J. Theis
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | - Ramon Grima
- School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
- * E-mail: (RG); (JH)
| | - Jan Hasenauer
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
- * E-mail: (RG); (JH)
| |
Collapse
|
26
|
Heinemann T, Raue A. Model calibration and uncertainty analysis in signaling networks. Curr Opin Biotechnol 2016; 39:143-149. [PMID: 27085224 DOI: 10.1016/j.copbio.2016.04.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 03/27/2016] [Accepted: 04/01/2016] [Indexed: 10/22/2022]
Abstract
For a long time the biggest challenges in modeling cellular signal transduction networks has been the inference of crucial pathway components and the qualitative description of their interactions. As a result of the emergence of powerful high-throughput experiments, it is now possible to measure data of high temporal and spatial resolution and to analyze signaling dynamics quantitatively. In addition, this increase of high-quality data is the basis for a better understanding of model limitations and their influence on the predictive power of models. We review established approaches in signal transduction network modeling with a focus on ordinary differential equation models as well as related developments in model calibration. As central aspects of the calibration process we discuss possibilities of model adaptation based on data-driven parameter optimization and the concomitant objective of reducing model uncertainties.
Collapse
Affiliation(s)
- Tim Heinemann
- Merrimack, One Kendall Sq., Suite B7201, Cambridge, MA 02139, USA
| | - Andreas Raue
- Merrimack, One Kendall Sq., Suite B7201, Cambridge, MA 02139, USA.
| |
Collapse
|
27
|
Hross S, Hasenauer J. Analysis of CFSE time-series data using division-, age- and label-structured population models. Bioinformatics 2016; 32:2321-9. [PMID: 27153577 DOI: 10.1093/bioinformatics/btw131] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 03/01/2016] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION In vitro and in vivo cell proliferation is often studied using the dye carboxyfluorescein succinimidyl ester (CFSE). The CFSE time-series data provide information about the proliferation history of populations of cells. While the experimental procedures are well established and widely used, the analysis of CFSE time-series data is still challenging. Many available analysis tools do not account for cell age and employ optimization methods that are inefficient (or even unreliable). RESULTS We present a new model-based analysis method for CFSE time-series data. This method uses a flexible description of proliferating cell populations, namely, a division-, age- and label-structured population model. Efficient maximum likelihood and Bayesian estimation algorithms are introduced to infer the model parameters and their uncertainties. These methods exploit the forward sensitivity equations of the underlying partial differential equation model for efficient and accurate gradient calculation, thereby improving computational efficiency and reliability compared with alternative approaches and accelerating uncertainty analysis. The performance of the method is assessed by studying a dataset for immune cell proliferation. This revealed the importance of different factors on the proliferation rates of individual cells. Among others, the predominate effect of cell age on the division rate is found, which was not revealed by available computational methods. AVAILABILITY AND IMPLEMENTATION The MATLAB source code implementing the models and algorithms is available from http://janhasenauer.github.io/ShAPE-DALSP/Contact: jan.hasenauer@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sabrina Hross
- Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg 85764, Germany Department of Mathematical Modeling of Biological Systems, Center for Mathematics, Technische Universität München, Garching 85748, Germany
| | - Jan Hasenauer
- Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg 85764, Germany Department of Mathematical Modeling of Biological Systems, Center for Mathematics, Technische Universität München, Garching 85748, Germany
| |
Collapse
|
28
|
Bayesian Model Selection Methods and Their Application to Biological ODE Systems. UNCERTAINTY IN BIOLOGY 2016. [DOI: 10.1007/978-3-319-21296-8_10] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
29
|
Krauss M, Schuppert A. Assessing interindividual variability by Bayesian-PBPK modeling. ACTA ACUST UNITED AC 2016. [DOI: 10.1016/j.ddmod.2017.08.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
30
|
Klinke DJ, Birtwistle MR. In silico model-based inference: an emerging approach for inverse problems in engineering better medicines. Curr Opin Chem Eng 2015; 10:14-24. [PMID: 26309811 PMCID: PMC4545575 DOI: 10.1016/j.coche.2015.07.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Identifying the network of biochemical interactions that underpin disease pathophysiology is a key hurdle in drug discovery. While many components involved in these biological processes are identified, how components organize differently in health and disease remains unclear. In chemical engineering, mechanistic modeling provides a quantitative framework to capture our understanding of a reactive system and test this knowledge against data. Here, we describe an emerging approach to test this knowledge against data that leverages concepts from probability, Bayesian statistics, and chemical kinetics by focusing on two related inverse problems. The first problem is to identify the causal structure of the reaction network, given uncertainty as to how the reactive components interact. The second problem is to identify the values of the model parameters, when a network is known a priori.
Collapse
Affiliation(s)
- David J. Klinke
- Department of Chemical Engineering and Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, WV
- Department of Microbiology, Immunology, & Cell Biology, West Virginia University, Morgantown, WV
| | - Marc R. Birtwistle
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
31
|
Raue A, Steiert B, Schelker M, Kreutz C, Maiwald T, Hass H, Vanlier J, Tönsing C, Adlung L, Engesser R, Mader W, Heinemann T, Hasenauer J, Schilling M, Höfer T, Klipp E, Theis F, Klingmüller U, Schöberl B, Timmer J. Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems. Bioinformatics 2015; 31:3558-60. [PMID: 26142188 DOI: 10.1093/bioinformatics/btv405] [Citation(s) in RCA: 134] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 06/28/2015] [Indexed: 02/02/2023] Open
Abstract
UNLABELLED Modeling of dynamical systems using ordinary differential equations is a popular approach in the field of systems biology. Two of the most critical steps in this approach are to construct dynamical models of biochemical reaction networks for large datasets and complex experimental conditions and to perform efficient and reliable parameter estimation for model fitting. We present a modeling environment for MATLAB that pioneers these challenges. The numerically expensive parts of the calculations such as the solving of the differential equations and of the associated sensitivity system are parallelized and automatically compiled into efficient C code. A variety of parameter estimation algorithms as well as frequentist and Bayesian methods for uncertainty analysis have been implemented and used on a range of applications that lead to publications. AVAILABILITY AND IMPLEMENTATION The Data2Dynamics modeling environment is MATLAB based, open source and freely available at http://www.data2dynamics.org. CONTACT andreas.raue@fdm.uni-freiburg.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- A Raue
- Merrimack Pharmaceuticals Inc., Discovery Devision, Cambridge, MA 02139, USA
| | - B Steiert
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany
| | - M Schelker
- Humboldt-Universität zu Berlin, Theoretical Biophysics, 10115 Berlin, Germany
| | - C Kreutz
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany
| | - T Maiwald
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany
| | - H Hass
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany
| | - J Vanlier
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany
| | - C Tönsing
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany
| | - L Adlung
- Systems Biology of Signal Transduction and
| | - R Engesser
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany
| | - W Mader
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany
| | - T Heinemann
- Divison of Theoretical Systems Biology, German Cancer Research Center, 69120 Heidelberg, Germany, BioQuant, University of Heidelberg, 69120 Heidelberg, Germany
| | - J Hasenauer
- Helmholtz Center Munich, Institute of Computational Biology, 85764 Neuherberg, Germany, Technische Universität München, Department of Mathematics, 85748 Garching, Germany and
| | | | - T Höfer
- Divison of Theoretical Systems Biology, German Cancer Research Center, 69120 Heidelberg, Germany, BioQuant, University of Heidelberg, 69120 Heidelberg, Germany
| | - E Klipp
- Humboldt-Universität zu Berlin, Theoretical Biophysics, 10115 Berlin, Germany
| | - F Theis
- Helmholtz Center Munich, Institute of Computational Biology, 85764 Neuherberg, Germany, Technische Universität München, Department of Mathematics, 85748 Garching, Germany and
| | | | - B Schöberl
- Merrimack Pharmaceuticals Inc., Discovery Devision, Cambridge, MA 02139, USA
| | - J Timmer
- University of Freiburg, Institute for Physics, 79104 Freiburg, Germany, BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
32
|
A single-cell model of PIP3 dynamics using chemical dimerization. Bioorg Med Chem 2015; 23:2868-76. [DOI: 10.1016/j.bmc.2015.04.074] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Revised: 04/23/2015] [Accepted: 04/24/2015] [Indexed: 11/22/2022]
|
33
|
Aitken S, Kilpatrick AM, Akman OE. Dizzy-Beats: a Bayesian evidence analysis tool for systems biology. Bioinformatics 2015; 31:1863-5. [PMID: 25637558 PMCID: PMC4443683 DOI: 10.1093/bioinformatics/btv062] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 01/26/2015] [Indexed: 11/13/2022] Open
Abstract
Motivation: Model selection and parameter inference are complex problems of long-standing interest in systems biology. Selecting between competing models arises commonly as underlying biochemical mechanisms are often not fully known, hence alternative models must be considered. Parameter inference yields important information on the extent to which the data and the model constrain parameter values. Results: We report Dizzy-Beats, a graphical Java Bayesian evidence analysis tool implementing nested sampling - an algorithm yielding an estimate of the log of the Bayesian evidence Z and the moments of model parameters, thus addressing two outstanding challenges in systems modelling. A likelihood function based on the L1-norm is adopted as it is generically applicable to replicated time series data. Availability and implementation:http://sourceforge.net/p/bayesevidence/home/Home/ Contact:s.aitken@ed.ac.uk
Collapse
Affiliation(s)
- Stuart Aitken
- MRC Human Genetics Unit, IGMM, University of Edinburgh, Edinburgh EH4 2XU, UK, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK, Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, USA, Centre for Systems, Dynamics and Control, College of Engineering, Mathematics & Physical Sciences, University of Exeter, Exeter EX4 4QF, UK
| | - Alastair M Kilpatrick
- MRC Human Genetics Unit, IGMM, University of Edinburgh, Edinburgh EH4 2XU, UK, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK, Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, USA, Centre for Systems, Dynamics and Control, College of Engineering, Mathematics & Physical Sciences, University of Exeter, Exeter EX4 4QF, UK MRC Human Genetics Unit, IGMM, University of Edinburgh, Edinburgh EH4 2XU, UK, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK, Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, USA, Centre for Systems, Dynamics and Control, College of Engineering, Mathematics & Physical Sciences, University of Exeter, Exeter EX4 4QF, UK
| | - Ozgur E Akman
- MRC Human Genetics Unit, IGMM, University of Edinburgh, Edinburgh EH4 2XU, UK, School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK, Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, USA, Centre for Systems, Dynamics and Control, College of Engineering, Mathematics & Physical Sciences, University of Exeter, Exeter EX4 4QF, UK
| |
Collapse
|
34
|
Klinke DJ. In silico model-based inference: a contemporary approach for hypothesis testing in network biology. Biotechnol Prog 2014; 30:1247-61. [PMID: 25139179 DOI: 10.1002/btpr.1982] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Revised: 08/14/2014] [Indexed: 01/31/2023]
Abstract
Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics.
Collapse
Affiliation(s)
- David J Klinke
- Dept. of Chemical Engineering, Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, WV, 26506; Dept. of Microbiology, Immunology and Cell Biology, Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, WV, 26506
| |
Collapse
|
35
|
Hasenauer J, Hasenauer C, Hucho T, Theis FJ. ODE constrained mixture modelling: a method for unraveling subpopulation structures and dynamics. PLoS Comput Biol 2014; 10:e1003686. [PMID: 24992156 PMCID: PMC4081021 DOI: 10.1371/journal.pcbi.1003686] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 05/09/2014] [Indexed: 12/02/2022] Open
Abstract
Functional cell-to-cell variability is ubiquitous in multicellular organisms as well as bacterial populations. Even genetically identical cells of the same cell type can respond differently to identical stimuli. Methods have been developed to analyse heterogeneous populations, e.g., mixture models and stochastic population models. The available methods are, however, either incapable of simultaneously analysing different experimental conditions or are computationally demanding and difficult to apply. Furthermore, they do not account for biological information available in the literature. To overcome disadvantages of existing methods, we combine mixture models and ordinary differential equation (ODE) models. The ODE models provide a mechanistic description of the underlying processes while mixture models provide an easy way to capture variability. In a simulation study, we show that the class of ODE constrained mixture models can unravel the subpopulation structure and determine the sources of cell-to-cell variability. In addition, the method provides reliable estimates for kinetic rates and subpopulation characteristics. We use ODE constrained mixture modelling to study NGF-induced Erk1/2 phosphorylation in primary sensory neurones, a process relevant in inflammatory and neuropathic pain. We propose a mechanistic pathway model for this process and reconstructed static and dynamical subpopulation characteristics across experimental conditions. We validate the model predictions experimentally, which verifies the capabilities of ODE constrained mixture models. These results illustrate that ODE constrained mixture models can reveal novel mechanistic insights and possess a high sensitivity. In this manuscript, we introduce ODE constrained mixture models for the analysis of population snapshot data of kinetics and dose responses. Population snapshot data can for instance be derived from flow cytometry or single-cell microscopy and provide information about the population structure and the dynamics of subpopulations. Currently available methods enable, however, only the extraction of this information if the subpopulations are very different. By combining pathway-specific ODE and mixture models, a more sensitive method is obtained, which can simultaneously analyse a variety of experimental conditions. ODE constrained mixture models facilitate the reconstruction of subpopulation sizes and dynamics, even in situations where the subpopulations are hardly distinguishable. This is shown for a simulation example as well as for the process of NGF-induced Erk1/2 phosphorylation in primary sensory neurones. We find that the proposed method allows for a simple but pervasive analysis of heterogeneous cell systems and more profound, mechanistic insights.
Collapse
Affiliation(s)
- Jan Hasenauer
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Mathematical Modeling of Biological Systems, Department of Mathematics, University of Technology Munich, Munich, Germany
- * E-mail:
| | | | - Tim Hucho
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Division of Experimental Anesthesiology and Pain Research, Department of Anesthesiology and Intensive Care Medicine, University Hospital Cologne, Cologne, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Mathematical Modeling of Biological Systems, Department of Mathematics, University of Technology Munich, Munich, Germany
| |
Collapse
|
36
|
Kanodia J, Chai D, Vollmer J, Kim J, Raue A, Finn G, Schoeberl B. Deciphering the mechanism behind Fibroblast Growth Factor (FGF) induced biphasic signal-response profiles. Cell Commun Signal 2014; 12:34. [PMID: 24885272 PMCID: PMC4036111 DOI: 10.1186/1478-811x-12-34] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 04/28/2014] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The Fibroblast Growth Factor (FGF) pathway is driving various aspects of cellular responses in both normal and malignant cells. One interesting characteristic of this pathway is the biphasic nature of the cellular response to some FGF ligands like FGF2. Specifically, it has been shown that phenotypic behaviors controlled by FGF signaling, like migration and growth, reach maximal levels in response to intermediate concentrations, while high levels of FGF2 elicit weak responses. The mechanisms leading to the observed biphasic response remains unexplained. RESULTS A combination of experiments and computational modeling was used to understand the mechanism behind the observed biphasic signaling responses. FGF signaling involves a tertiary surface interaction that we captured with a computational model based on Ordinary Differential Equations (ODEs). It accounts for FGF2 binding to FGF receptors (FGFRs) and heparan sulfate glycosaminoglycans (HSGAGs), followed by receptor-phosphorylation, activation of the FRS2 adapter protein and the Ras-Raf signaling cascade. Quantitative protein assays were used to measure the dynamics of phosphorylated ERK (pERK) in response to a wide range of FGF2 ligand concentrations on a fine-grained time scale for the squamous cell lung cancer cell line H1703. We developed a novel approach combining Particle Swarm Optimization (PSO) and feature-based constraints in the objective function to calibrate the computational model to the experimental data. The model is validated using a series of extracellular and intracellular perturbation experiments. We demonstrate that in silico model predictions are in accordance with the observed in vitro results. CONCLUSIONS Using a combined approach of computational modeling and experiments we found that competition between binding of the ligand FGF2 to HSGAG and FGF receptor leads to the biphasic response. At low to intermediate concentrations of FGF2 there are sufficient free FGF receptors available for the FGF2-HSGAG complex to enable the formation of the trimeric signaling unit. At high ligand concentrations the ligand binding sites of the receptor become saturated and the trimeric signaling unit cannot be formed. This insight into the pathway is an important consideration for the pharmacological inhibition of this pathway.
Collapse
Affiliation(s)
- Jitendra Kanodia
- Merrimack Pharmaceuticals, Suite B7201, 1 Kendall Square, Cambridge, MA 02139, USA.
| | | | | | | | | | | | | |
Collapse
|
37
|
Vanlier J, Tiemann CA, Hilbers PAJ, van Riel NAW. Optimal experiment design for model selection in biochemical networks. BMC SYSTEMS BIOLOGY 2014; 8:20. [PMID: 24555498 PMCID: PMC3946009 DOI: 10.1186/1752-0509-8-20] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2013] [Accepted: 02/13/2014] [Indexed: 01/06/2023]
Abstract
Background Mathematical modeling is often used to formalize hypotheses on how a biochemical network operates by discriminating between competing models. Bayesian model selection offers a way to determine the amount of evidence that data provides to support one model over the other while favoring simple models. In practice, the amount of experimental data is often insufficient to make a clear distinction between competing models. Often one would like to perform a new experiment which would discriminate between competing hypotheses. Results We developed a novel method to perform Optimal Experiment Design to predict which experiments would most effectively allow model selection. A Bayesian approach is applied to infer model parameter distributions. These distributions are sampled and used to simulate from multivariate predictive densities. The method is based on a k-Nearest Neighbor estimate of the Jensen Shannon divergence between the multivariate predictive densities of competing models. Conclusions We show that the method successfully uses predictive differences to enable model selection by applying it to several test cases. Because the design criterion is based on predictive distributions, which can be computed for a wide range of model quantities, the approach is very flexible. The method reveals specific combinations of experiments which improve discriminability even in cases where data is scarce. The proposed approach can be used in conjunction with existing Bayesian methodologies where (approximate) posteriors have been determined, making use of relations that exist within the inferred posteriors.
Collapse
Affiliation(s)
- Joep Vanlier
- Eindhoven University of Technology, Department of Biomedical Engineering, PO Box 513, Eindhoven, 5600 MB, The Netherlands.
| | | | | | | |
Collapse
|
38
|
Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014; 11:20130505. [PMID: 24307566 PMCID: PMC3869153 DOI: 10.1098/rsif.2013.0505] [Citation(s) in RCA: 133] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/12/2013] [Indexed: 12/17/2022] Open
Abstract
The interplay of mathematical modelling with experiments is one of the central elements in systems biology. The aim of reverse engineering is to infer, analyse and understand, through this interplay, the functional and regulatory mechanisms of biological systems. Reverse engineering is not exclusive of systems biology and has been studied in different areas, such as inverse problem theory, machine learning, nonlinear physics, (bio)chemical kinetics, control theory and optimization, among others. However, it seems that many of these areas have been relatively closed to outsiders. In this contribution, we aim to compare and highlight the different perspectives and contributions from these fields, with emphasis on two key questions: (i) why are reverse engineering problems so hard to solve, and (ii) what methods are available for the particular problems arising from systems biology?
Collapse
Affiliation(s)
| | - Julio R. Banga
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo 36208, Spain
| |
Collapse
|
39
|
Vehlow C, Hasenauer J, Kramer A, Raue A, Hug S, Timmer J, Radde N, Theis FJ, Weiskopf D. iVUN: interactive Visualization of Uncertain biochemical reaction Networks. BMC Bioinformatics 2013; 14 Suppl 19:S2. [PMID: 24564335 PMCID: PMC4067946 DOI: 10.1186/1471-2105-14-s19-s2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mathematical models are nowadays widely used to describe biochemical reaction networks. One of the main reasons for this is that models facilitate the integration of a multitude of different data and data types using parameter estimation. Thereby, models allow for a holistic understanding of biological processes. However, due to measurement noise and the limited amount of data, uncertainties in the model parameters should be considered when conclusions are drawn from estimated model attributes, such as reaction fluxes or transient dynamics of biological species. METHODS AND RESULTS We developed the visual analytics system iVUN that supports uncertainty-aware analysis of static and dynamic attributes of biochemical reaction networks modeled by ordinary differential equations. The multivariate graph of the network is visualized as a node-link diagram, and statistics of the attributes are mapped to the color of nodes and links of the graph. In addition, the graph view is linked with several views, such as line plots, scatter plots, and correlation matrices, to support locating uncertainties and the analysis of their time dependencies. As demonstration, we use iVUN to quantitatively analyze the dynamics of a model for Epo-induced JAK2/STAT5 signaling. CONCLUSION Our case study showed that iVUN can be used to perform an in-depth study of biochemical reaction networks, including attribute uncertainties, correlations between these attributes and their uncertainties as well as the attribute dynamics. In particular, the linking of different visualization options turned out to be highly beneficial for the complex analysis tasks that come with the biological systems as presented here.
Collapse
|
40
|
Lessons learned from quantitative dynamical modeling in systems biology. PLoS One 2013; 8:e74335. [PMID: 24098642 PMCID: PMC3787051 DOI: 10.1371/journal.pone.0074335] [Citation(s) in RCA: 191] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Accepted: 07/31/2013] [Indexed: 11/19/2022] Open
Abstract
Due to the high complexity of biological data it is difficult to disentangle cellular processes relying only on intuitive interpretation of measurements. A Systems Biology approach that combines quantitative experimental data with dynamic mathematical modeling promises to yield deeper insights into these processes. Nevertheless, with growing complexity and increasing amount of quantitative experimental data, building realistic and reliable mathematical models can become a challenging task: the quality of experimental data has to be assessed objectively, unknown model parameters need to be estimated from the experimental data, and numerical calculations need to be precise and efficient. Here, we discuss, compare and characterize the performance of computational methods throughout the process of quantitative dynamic modeling using two previously established examples, for which quantitative, dose- and time-resolved experimental data are available. In particular, we present an approach that allows to determine the quality of experimental data in an efficient, objective and automated manner. Using this approach data generated by different measurement techniques and even in single replicates can be reliably used for mathematical modeling. For the estimation of unknown model parameters, the performance of different optimization algorithms was compared systematically. Our results show that deterministic derivative-based optimization employing the sensitivity equations in combination with a multi-start strategy based on latin hypercube sampling outperforms the other methods by orders of magnitude in accuracy and speed. Finally, we investigated transformations that yield a more efficient parameterization of the model and therefore lead to a further enhancement in optimization performance. We provide a freely available open source software package that implements the algorithms and examples compared here.
Collapse
|
41
|
Hock S, Hasenauer J, Theis FJ. Modeling of 2D diffusion processes based on microscopy data: parameter estimation and practical identifiability analysis. BMC Bioinformatics 2013; 14 Suppl 10:S7. [PMID: 24267545 PMCID: PMC3750519 DOI: 10.1186/1471-2105-14-s10-s7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Background Diffusion is a key component of many biological processes such as chemotaxis, developmental differentiation and tissue morphogenesis. Since recently, the spatial gradients caused by diffusion can be assessed in-vitro and in-vivo using microscopy based imaging techniques. The resulting time-series of two dimensional, high-resolutions images in combination with mechanistic models enable the quantitative analysis of the underlying mechanisms. However, such a model-based analysis is still challenging due to measurement noise and sparse observations, which result in uncertainties of the model parameters. Methods We introduce a likelihood function for image-based measurements with log-normal distributed noise. Based upon this likelihood function we formulate the maximum likelihood estimation problem, which is solved using PDE-constrained optimization methods. To assess the uncertainty and practical identifiability of the parameters we introduce profile likelihoods for diffusion processes. Results and conclusion As proof of concept, we model certain aspects of the guidance of dendritic cells towards lymphatic vessels, an example for haptotaxis. Using a realistic set of artificial measurement data, we estimate the five kinetic parameters of this model and compute profile likelihoods. Our novel approach for the estimation of model parameters from image data as well as the proposed identifiability analysis approach is widely applicable to diffusion processes. The profile likelihood based method provides more rigorous uncertainty bounds in contrast to local approximation methods.
Collapse
|