1
|
|
2
|
Stumpf MPH. Multi-model and network inference based on ensemble estimates: avoiding the madness of crowds. J R Soc Interface 2020; 17:20200419. [PMID: 33081645 PMCID: PMC7653378 DOI: 10.1098/rsif.2020.0419] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Recent progress in theoretical systems biology, applied mathematics and computational statistics allows us to compare the performance of different candidate models at describing a particular biological system quantitatively. Model selection has been applied with great success to problems where a small number-typically less than 10-of models are compared, but recent studies have started to consider thousands and even millions of candidate models. Often, however, we are left with sets of models that are compatible with the data, and then we can use ensembles of models to make predictions. These ensembles can have very desirable characteristics, but as I show here are not guaranteed to improve on individual estimators or predictors. I will show in the cases of model selection and network inference when we can trust ensembles, and when we should be cautious. The analyses suggest that the careful construction of an ensemble-choosing good predictors-is of paramount importance, more than had perhaps been realized before: merely adding different methods does not suffice. The success of ensemble network inference methods is also shown to rest on their ability to suppress false-positive results. A Jupyter notebook which allows carrying out an assessment of ensemble estimators is provided.
Collapse
Affiliation(s)
- Michael P H Stumpf
- School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia.,Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
3
|
Reserve Flux Capacity in the Pentose Phosphate Pathway Enables Escherichia coli's Rapid Response to Oxidative Stress. Cell Syst 2018; 6:569-578.e7. [PMID: 29753645 DOI: 10.1016/j.cels.2018.04.009] [Citation(s) in RCA: 123] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 01/19/2018] [Accepted: 04/10/2018] [Indexed: 01/01/2023]
Abstract
To counteract oxidative stress and reactive oxygen species (ROS), bacteria evolved various mechanisms, primarily reducing ROS through antioxidant systems that utilize cofactor NADPH. Cells must stabilize NADPH levels by increasing flux through replenishing metabolic pathways like pentose phosphate (PP) pathway. Here, we investigate the mechanism enabling the rapid increase in NADPH supply by exposing Escherichia coli to hydrogen peroxide and quantifying the immediate metabolite dynamics. To systematically infer active regulatory interactions governing this response, we evaluated ensembles of kinetic models of glycolysis and PP pathway, each with different regulation mechanisms. Besides the known inactivation of glyceraldehyde 3-phosphate dehydrogenase by ROS, we reveal the important allosteric inhibition of the first PP pathway enzyme by NADPH. This NADPH feedback inhibition maintains a below maximum-capacity PP pathway flux under non-stress conditions. Relieving this inhibition instantly increases PP pathway flux upon oxidative stress. We demonstrate that reducing cells' capacity to rapidly reroute their flux through the PP pathway increases their oxidative stress sensitivity.
Collapse
|
4
|
Engelhardt B, Kschischo M, Fröhlich H. A Bayesian approach to estimating hidden variables as well as missing and wrong molecular interactions in ordinary differential equation-based mathematical models. J R Soc Interface 2018; 14:rsif.2017.0332. [PMID: 28615495 PMCID: PMC5493809 DOI: 10.1098/rsif.2017.0332] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 05/23/2017] [Indexed: 11/12/2022] Open
Abstract
Ordinary differential equations (ODEs) are a popular approach to quantitatively model molecular networks based on biological knowledge. However, such knowledge is typically restricted. Wrongly modelled biological mechanisms as well as relevant external influence factors that are not included into the model are likely to manifest in major discrepancies between model predictions and experimental data. Finding the exact reasons for such observed discrepancies can be quite challenging in practice. In order to address this issue, we suggest a Bayesian approach to estimate hidden influences in ODE-based models. The method can distinguish between exogenous and endogenous hidden influences. Thus, we can detect wrongly specified as well as missed molecular interactions in the model. We demonstrate the performance of our Bayesian dynamic elastic-net with several ordinary differential equation models from the literature, such as human JAK-STAT signalling, information processing at the erythropoietin receptor, isomerization of liquid α-Pinene, G protein cycling in yeast and UV-B triggered signalling in plants. Moreover, we investigate a set of commonly known network motifs and a gene-regulatory network. Altogether our method supports the modeller in an algorithmic manner to identify possible sources of errors in ODE-based models on the basis of experimental data.
Collapse
Affiliation(s)
- Benjamin Engelhardt
- Rheinische Friedrich-Wilhelms-Universität Bonn, Algorithmic Bioinformatics, Bonn, Germany .,DFG Research Training Group 1873, Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
| | - Maik Kschischo
- Department of Mathematics and Technology, University of Applied Sciences Koblenz, RheinAhrCampus, Remagen, Germany
| | - Holger Fröhlich
- Rheinische Friedrich-Wilhelms-Universität Bonn, Algorithmic Bioinformatics, Bonn, Germany.,UCB Biosciences GmbH, Monheim, Germany
| |
Collapse
|
5
|
Formulation, construction and analysis of kinetic models of metabolism: A review of modelling frameworks. Biotechnol Adv 2017; 35:981-1003. [PMID: 28916392 DOI: 10.1016/j.biotechadv.2017.09.005] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 08/30/2017] [Accepted: 09/10/2017] [Indexed: 12/13/2022]
Abstract
Kinetic models are critical to predict the dynamic behaviour of metabolic networks. Mechanistic kinetic models for large networks remain uncommon due to the difficulty of fitting their parameters. Recent modelling frameworks promise new ways to overcome this obstacle while retaining predictive capabilities. In this review, we present an overview of the relevant mathematical frameworks for kinetic formulation, construction and analysis. Starting with kinetic formalisms, we next review statistical methods for parameter inference, as well as recent computational frameworks applied to the construction and analysis of kinetic models. Finally, we discuss opportunities and limitations hindering the development of larger kinetic reconstructions.
Collapse
|
6
|
Klimovskaia A, Ganscha S, Claassen M. Sparse Regression Based Structure Learning of Stochastic Reaction Networks from Single Cell Snapshot Time Series. PLoS Comput Biol 2016; 12:e1005234. [PMID: 27923064 PMCID: PMC5140059 DOI: 10.1371/journal.pcbi.1005234] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 11/02/2016] [Indexed: 11/29/2022] Open
Abstract
Stochastic chemical reaction networks constitute a model class to quantitatively describe dynamics and cell-to-cell variability in biological systems. The topology of these networks typically is only partially characterized due to experimental limitations. Current approaches for refining network topology are based on the explicit enumeration of alternative topologies and are therefore restricted to small problem instances with almost complete knowledge. We propose the reactionet lasso, a computational procedure that derives a stepwise sparse regression approach on the basis of the Chemical Master Equation, enabling large-scale structure learning for reaction networks by implicitly accounting for billions of topology variants. We have assessed the structure learning capabilities of the reactionet lasso on synthetic data for the complete TRAIL induced apoptosis signaling cascade comprising 70 reactions. We find that the reactionet lasso is able to efficiently recover the structure of these reaction systems, ab initio, with high sensitivity and specificity. With only < 1% false discoveries, the reactionet lasso is able to recover 45% of all true reactions ab initio among > 6000 possible reactions and over 102000 network topologies. In conjunction with information rich single cell technologies such as single cell RNA sequencing or mass cytometry, the reactionet lasso will enable large-scale structure learning, particularly in areas with partial network structure knowledge, such as cancer biology, and thereby enable the detection of pathological alterations of reaction networks. We provide software to allow for wide applicability of the reactionet lasso. Virtually all biological processes are driven by biochemical reactions. However, their quantitative description in terms of stochastic chemical reaction networks is often precluded by the computational difficulty of structure learning, i.e. the identification of biologically active reaction networks among the combinatorially many possible topologies. This work describes the reactionet lasso, a structure learning approach that takes advantage of novel, information-rich single cell data and a tractable problem formulation to achieve structure learning for problem instances hundreds of orders of magnitude larger than previously reported. This approach opens the prospect of obtaining quantitative and predictive reaction models in many areas of biology and medicine, and in particular areas such as cancer biology, which are characterized by significant system alterations and many unknown reactions.
Collapse
Affiliation(s)
- Anna Klimovskaia
- Institute for Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
- Life Science Zurich Graduate School, Zurich, Switzerland
| | - Stefan Ganscha
- Institute for Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
- Life Science Zurich Graduate School, Zurich, Switzerland
| | - Manfred Claassen
- Institute for Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
- * E-mail:
| |
Collapse
|
7
|
Milias-Argeitis A, Oliveira AP, Gerosa L, Falter L, Sauer U, Lygeros J. Elucidation of Genetic Interactions in the Yeast GATA-Factor Network Using Bayesian Model Selection. PLoS Comput Biol 2016; 12:e1004784. [PMID: 26967983 PMCID: PMC4788432 DOI: 10.1371/journal.pcbi.1004784] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 02/02/2016] [Indexed: 12/03/2022] Open
Abstract
Understanding the structure and function of complex gene regulatory networks using classical genetic assays is an error-prone procedure that frequently generates ambiguous outcomes. Even some of the best-characterized gene networks contain interactions whose validity is not conclusively proven. Founded on dynamic experimental data, mechanistic mathematical models are able to offer detailed insights that would otherwise require prohibitively large numbers of genetic experiments. Here we attempt mechanistic modeling of the transcriptional network formed by the four GATA-factor proteins, a well-studied system of central importance for nitrogen-source regulation of transcription in the yeast Saccharomyces cerevisiae. To resolve ambiguities in the network organization, we encoded a set of five interactions hypothesized in the literature into a set of 32 mathematical models, and employed Bayesian model selection to identify the most plausible set of interactions based on dynamic gene expression data. The top-ranking model was validated on newly generated GFP reporter dynamic data and was subsequently used to gain a better understanding of how yeast cells organize their transcriptional response to dynamic changes of nitrogen sources. Our work constitutes a necessary and important step towards obtaining a holistic view of the yeast nitrogen regulation mechanisms; on the computational side, it provides a demonstration of how powerful Monte Carlo techniques can be creatively combined and used to address the great challenges of large-scale dynamical system inference. Gene regulatory networks underlie all key processes that enable a cell to maintain long-term homeostasis in a changing environment. Understanding the structure and function of complex gene networks is an experimentally difficult and error-prone procedure. Mechanistic mathematical modeling promises to alleviate these problems, as we demonstrate here for the yeast GATA-factor network, the central controller of the cellular response to nitrogen source quality. Despite years of targeted studies, the interaction pattern of this network is still not known precisely. To resolve several still-remaining ambiguities, we generated a set of alternative mathematical models, and compared them against each other using Bayesian model selection based on dynamic gene expression data. The top-ranking model was then validated on a separate, newly generated dataset. Our work thus provides new insights to the mechanism of nitrogen regulation in yeast, while at the same time overcoming some key computational inference problems for large models in systems biology.
Collapse
Affiliation(s)
| | | | - Luca Gerosa
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Laura Falter
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Uwe Sauer
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - John Lygeros
- Automatic Control Laboratory, ETH Zurich, Zurich, Switzerland
- * E-mail:
| |
Collapse
|
8
|
Engelhardt B, Frőhlich H, Kschischo M. Learning (from) the errors of a systems biology model. Sci Rep 2016; 6:20772. [PMID: 26865316 PMCID: PMC4749970 DOI: 10.1038/srep20772] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 01/07/2016] [Indexed: 01/15/2023] Open
Abstract
Mathematical modelling is a labour intensive process involving several iterations of testing on real data and manual model modifications. In biology, the domain knowledge guiding model development is in many cases itself incomplete and uncertain. A major problem in this context is that biological systems are open. Missed or unknown external influences as well as erroneous interactions in the model could thus lead to severely misleading results. Here we introduce the dynamic elastic-net, a data driven mathematical method which automatically detects such model errors in ordinary differential equation (ODE) models. We demonstrate for real and simulated data, how the dynamic elastic-net approach can be used to automatically (i) reconstruct the error signal, (ii) identify the target variables of model error, and (iii) reconstruct the true system state even for incomplete or preliminary models. Our work provides a systematic computational method facilitating modelling of open biological systems under uncertain knowledge.
Collapse
Affiliation(s)
- Benjamin Engelhardt
- Rheinische Friedrich-Wilhelms-Universität Bonn, Institute for Computer Science, Algorithmic Bioinformatics, c/o Bonn-Aachen International Center for IT, Dahlmannstr. 2, 53113, Bonn, Germany
| | - Holger Frőhlich
- Rheinische Friedrich-Wilhelms-Universität Bonn, Institute for Computer Science, Algorithmic Bioinformatics, c/o Bonn-Aachen International Center for IT, Dahlmannstr. 2, 53113, Bonn, Germany
| | - Maik Kschischo
- University of Applied Sciences Koblenz, RheinAhrCampus, Department of Mathematics and Technology, Joseph-Rovan-Allee 2, 53424 Remagen, Germany
| |
Collapse
|
9
|
Abstract
Mathematical models of natural systems are abstractions of much more complicated processes. Developing informative and realistic models of such systems typically involves suitable statistical inference methods, domain expertise, and a modicum of luck. Except for cases where physical principles provide sufficient guidance, it will also be generally possible to come up with a large number of potential models that are compatible with a given natural system and any finite amount of data generated from experiments on that system. Here we develop a computational framework to systematically evaluate potentially vast sets of candidate differential equation models in light of experimental and prior knowledge about biological systems. This topological sensitivity analysis enables us to evaluate quantitatively the dependence of model inferences and predictions on the assumed model structures. Failure to consider the impact of structural uncertainty introduces biases into the analysis and potentially gives rise to misleading conclusions.
Collapse
|
10
|
Strelioff CC, Crutchfield JP. Bayesian structural inference for hidden processes. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 89:042119. [PMID: 24827205 DOI: 10.1103/physreve.89.042119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Indexed: 06/03/2023]
Abstract
We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian structural inference (BSI) relies on a set of candidate unifilar hidden Markov model (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological ε-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be ε-machines, irrespective of estimated transition probabilities. Properties of ε-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.
Collapse
Affiliation(s)
- Christopher C Strelioff
- Complexity Sciences Center and Physics Department, University of California at Davis, One Shields Avenue, Davis, California 95616, USA
| | - James P Crutchfield
- Complexity Sciences Center and Physics Department, University of California at Davis, One Shields Avenue, Davis, California 95616, USA and Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
11
|
Link H, Christodoulou D, Sauer U. Advancing metabolic models with kinetic information. Curr Opin Biotechnol 2014; 29:8-14. [PMID: 24534671 DOI: 10.1016/j.copbio.2014.01.015] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 01/18/2014] [Accepted: 01/23/2014] [Indexed: 12/21/2022]
Abstract
Kinetic models are crucial to quantitatively understand and predict how functional behavior emerges from dynamic concentration changes of cellular components. The current challenge is on resolving uncertainties about parameter values of reaction kinetics. Additionally, there are also major structural uncertainties due to unknown molecular interactions and only putatively assigned regulatory functions. What if one or few key regulators of biochemical reactions are missing in a metabolic model? By reviewing current advances in building kinetic models of metabolism, we found that such models experience a paradigm shift away from fitting parameters towards identifying key regulatory interactions.
Collapse
Affiliation(s)
- Hannes Link
- Institute of Molecular Systems Biology, ETH Zurich, Auguste-Piccard-Hof 1, 8093 Zurich, Switzerland
| | - Dimitris Christodoulou
- Institute of Molecular Systems Biology, ETH Zurich, Auguste-Piccard-Hof 1, 8093 Zurich, Switzerland; Life Science Zurich PhD Program on Systems Biology, Zurich, Switzerland
| | - Uwe Sauer
- Institute of Molecular Systems Biology, ETH Zurich, Auguste-Piccard-Hof 1, 8093 Zurich, Switzerland.
| |
Collapse
|