1
|
Moimenta AR, Henriques D, Minebois R, Querol A, Balsa-Canto E. Modelling the physiological status of yeast during wine fermentation enables the prediction of secondary metabolism. Microb Biotechnol 2023; 16:847-861. [PMID: 36722662 PMCID: PMC10034642 DOI: 10.1111/1751-7915.14211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 11/28/2022] [Accepted: 01/01/2023] [Indexed: 02/02/2023] Open
Abstract
Saccharomyces non-cerevisiae yeasts are gaining momentum in wine fermentation due to their potential to reduce ethanol content and achieve attractive aroma profiles. However, the design of the fermentation process for new species requires intensive experimentation. The use of mechanistic models could automate process design, yet to date, most fermentation models have focused on primary metabolism. Therefore, these models do not provide insight into the production of secondary metabolites essential for wine quality, such as aromas. In this work, we formulate a continuous model that accounts for the physiological status of yeast, that is, exponential growth, growth under nitrogen starvation and transition to stationary or decay phases. To do so, we assumed that nitrogen starvation is associated with carbohydrate accumulation and the induction of a set of transcriptional changes associated with the stationary phase. The model accurately described the dynamics of time series data for biomass and primary and secondary metabolites obtained for various yeast species in single culture fermentations. We also used the proposed model to explore different process designs, showing how the addition of nitrogen could affect the aromatic profile of wine. This study underlines the potential of incorporating yeast physiology into batch fermentation modelling and provides a new means of automating process design.
Collapse
Affiliation(s)
- Artai R Moimenta
- Bioprocess and Biosystems Engineering, IIM-CSIC, Vigo, Spain
- Applied Mathematics II, University of Vigo, Vigo, Spain
| | - David Henriques
- Bioprocess and Biosystems Engineering, IIM-CSIC, Vigo, Spain
| | - Romain Minebois
- Systems Biology of Yeasts of Biotechnological Interest, IATA-CSIC, Paterna, Spain
| | - Amparo Querol
- Systems Biology of Yeasts of Biotechnological Interest, IATA-CSIC, Paterna, Spain
| | - Eva Balsa-Canto
- Bioprocess and Biosystems Engineering, IIM-CSIC, Vigo, Spain
| |
Collapse
|
2
|
Daneker M, Zhang Z, Karniadakis GE, Lu L. Systems Biology: Identifiability Analysis and Parameter Identification via Systems-Biology-Informed Neural Networks. Methods Mol Biol 2023; 2634:87-105. [PMID: 37074575 DOI: 10.1007/978-1-0716-3008-2_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023]
Abstract
The dynamics of systems biological processes are usually modeled by a system of ordinary differential equations (ODEs) with many unknown parameters that need to be inferred from noisy and sparse measurements. Here, we introduce systems-biology-informed neural networks for parameter estimation by incorporating the system of ODEs into the neural networks. To complete the workflow of system identification, we also describe structural and practical identifiability analysis to analyze the identifiability of parameters. We use the ultradian endocrine model for glucose-insulin interaction as the example to demonstrate all these methods and their implementation.
Collapse
Affiliation(s)
- Mitchell Daneker
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
| | - Zhen Zhang
- Division of Applied Mathematics, Brown University, Providence, RI, USA
| | - George Em Karniadakis
- Division of Applied Mathematics, Brown University, Providence, RI, USA
- School of Engineering, Brown University, Providence, RI, USA
| | - Lu Lu
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
3
|
Melikechi O, Young AL, Tang T, Bowman T, Dunson D, Johndrow J. Limits of epidemic prediction using SIR models. J Math Biol 2022; 85:36. [PMID: 36125562 PMCID: PMC9487859 DOI: 10.1007/s00285-022-01804-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 08/12/2022] [Accepted: 08/30/2022] [Indexed: 11/27/2022]
Abstract
The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters based on noisy observations early in the outbreak, well before the epidemic reaches its peak. This allows prediction of the subsequent course of the epidemic and design of appropriate interventions. However, accurately inferring SIR model parameters in such scenarios is problematic. This article provides novel, theoretical insight on this issue of practical identifiability of the SIR model. Our theory provides new understanding of the inferential limits of routinely used epidemic models and provides a valuable addition to current simulate-and-check methods. We illustrate some practical implications through application to a real-world epidemic data set.
Collapse
Affiliation(s)
- Omar Melikechi
- Department of Mathematics, Duke University, Durham, NC, USA.
| | | | - Tao Tang
- Department of Mathematics, Duke University, Durham, NC, USA
| | - Trevor Bowman
- Department of Mathematics, Duke University, Durham, NC, USA
| | - David Dunson
- Department of Mathematics, Duke University, Durham, NC, USA.,Department of Statistics, Duke University, Durham, NC, USA
| | - James Johndrow
- Department of Statistics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
4
|
Villaverde AF, Pathirana D, Fröhlich F, Hasenauer J, Banga JR. A protocol for dynamic model calibration. Brief Bioinform 2022; 23:bbab387. [PMID: 34619769 PMCID: PMC8769694 DOI: 10.1093/bib/bbab387] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 08/06/2021] [Accepted: 08/29/2021] [Indexed: 12/23/2022] Open
Abstract
Ordinary differential equation models are nowadays widely used for the mechanistic description of biological processes and their temporal evolution. These models typically have many unknown and nonmeasurable parameters, which have to be determined by fitting the model to experimental data. In order to perform this task, known as parameter estimation or model calibration, the modeller faces challenges such as poor parameter identifiability, lack of sufficiently informative experimental data and the existence of local minima in the objective function landscape. These issues tend to worsen with larger model sizes, increasing the computational complexity and the number of unknown parameters. An incorrectly calibrated model is problematic because it may result in inaccurate predictions and misleading conclusions. For nonexpert users, there are a large number of potential pitfalls. Here, we provide a protocol that guides the user through all the steps involved in the calibration of dynamic models. We illustrate the methodology with two models and provide all the code required to reproduce the results and perform the same analysis on new models. Our protocol provides practitioners and researchers in biological modelling with a one-stop guide that is at the same time compact and sufficiently comprehensive to cover all aspects of the problem.
Collapse
Affiliation(s)
- Alejandro F Villaverde
- Universidade de Vigo, Department of Systems Engineering & Control, Vigo 36310, Galicia, Spain
| | - Dilan Pathirana
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn 53115, Germany
| | - Fabian Fröhlich
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Jan Hasenauer
- Center for Mathematics, Technische Universität München, Garching 85748, Germany
- Harvard Medical School, Cambridge, MA 02115, USA
| | - Julio R Banga
- Bioprocess Engineering Group, IIM-CSIC, Vigo 36208, Galicia, Spain
| |
Collapse
|
5
|
Deneer A, Fleck C. Mathematical Modelling in Plant Synthetic Biology. Methods Mol Biol 2022; 2379:209-251. [PMID: 35188665 DOI: 10.1007/978-1-0716-1791-5_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Mathematical modelling techniques are integral to current research in plant synthetic biology. Modelling approaches can provide mechanistic understanding of a system, allowing predictions of behaviour and thus providing a tool to help design and analyse biological circuits. In this chapter, we provide an overview of mathematical modelling methods and their significance for plant synthetic biology. Starting with the basics of dynamics, we describe the process of constructing a model over both temporal and spatial scales and highlight crucial approaches, such as stochastic modelling and model-based design. Next, we focus on the model parameters and the techniques required in parameter analysis. We then describe the process of selecting a model based on tests and criteria and proceed to methods that allow closer analysis of the system's behaviour. Finally, we highlight the importance of uncertainty in modelling approaches and how to deal with a lack of knowledge, noisy data, and biological variability; all aspects that play a crucial role in the cooperation between the experimental and modelling components. Overall, this chapter aims to illustrate the importance of mathematical modelling in plant synthetic biology, providing an introduction for those researchers who are working with or working on modelling techniques.
Collapse
Affiliation(s)
- Anna Deneer
- Biometris, Department of Mathematical and Statistical Methods, Wageningen University, Wageningen, The Netherlands
| | - Christian Fleck
- ETH Zurich, Department of Biosystems Science and Engineering, Basel, Switzerland.
- Freiburg Institute for Data Analysis and Mathematical Modelling, University of Freiburg, Freiburg im Breisgau, Germany.
| |
Collapse
|
6
|
Introducing Parameter Clustering to the OED Procedure for Model Calibration of a Synthetic Inducible Promoter in S. cerevisiae. Processes (Basel) 2021. [DOI: 10.3390/pr9061053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
In recent years, synthetic gene circuits for adding new cell features have become one of the most powerful tools in biological and pharmaceutical research and development. However, because of the inherent non-linearity and noisy experimental data, the experiment-based model calibration of these synthetic parts is perceived as a laborious and time-consuming procedure. Although the optimal experimental design (OED) based on the Fisher information matrix (FIM) has been proved to be an effective means to improve the calibration efficiency, the required calculation increases dramatically with the model size (parameter number). To reduce the OED complexity without losing the calibration accuracy, this paper proposes two OED approaches with different parameter clustering methods and validates the accuracy of calibrated models with in-silico experiments. A model of an inducible synthetic promoter in S. cerevisiae is adopted for bench-marking. The comparison with the traditional off-line OED approach suggests that the OED approaches with both of the clustering methods significantly reduce the complexity of OED problems (for at least 49.0%), while slightly improving the calibration accuracy (11.8% and 19.6% lower estimation error in average for FIM-based and sensitivity-based approaches). This study implicates that for calibrating non-linear models of biological pathways, cluster-based OED could be a beneficial approach to improve the efficiency of optimal experimental design.
Collapse
|
7
|
Balsa-Canto E, Bandiera L, Menolascina F. Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2. Methods Mol Biol 2021; 2229:221-239. [PMID: 33405225 DOI: 10.1007/978-1-0716-1032-9_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Dynamic modeling in systems and synthetic biology is still quite a challenge-the complex nature of the interactions results in nonlinear models, which include unknown parameters (or functions). Ideally, time-series data support the estimation of model unknowns through data fitting. Goodness-of-fit measures would lead to the best model among a set of candidates. However, even when state-of-the-art measuring techniques allow for an unprecedented amount of data, not all data suit dynamic modeling.Model-based optimal experimental design (OED) is intended to improve model predictive capabilities. OED can be used to define the set of experiments that would (a) identify the best model or (b) improve the identifiability of unknown parameters. In this chapter, we present a detailed practical procedure to compute optimal experiments using the AMIGO2 toolbox.
Collapse
Affiliation(s)
- Eva Balsa-Canto
- (Bio)Process Engineering Group, IIM-CSIC (Spanish National Research Council), Vigo, Spain.
| | - Lucia Bandiera
- School of Engineering, Institute for Bioengineering, The University of Edinburgh, Edinburgh, UK.,SynthSys - Centre for Synthetic and Systems Biology, The University of Edinburgh, Edinburgh, UK
| | - Filippo Menolascina
- School of Engineering, Institute for Bioengineering, The University of Edinburgh, Edinburgh, UK.,SynthSys - Centre for Synthetic and Systems Biology, The University of Edinburgh, Edinburgh, UK
| |
Collapse
|
8
|
Otero-Muras I, Carbonell P. Automated engineering of synthetic metabolic pathways for efficient biomanufacturing. Metab Eng 2020; 63:61-80. [PMID: 33316374 DOI: 10.1016/j.ymben.2020.11.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 11/15/2020] [Accepted: 11/20/2020] [Indexed: 12/19/2022]
Abstract
Metabolic engineering involves the engineering and optimization of processes from single-cell to fermentation in order to increase production of valuable chemicals for health, food, energy, materials and others. A systems approach to metabolic engineering has gained traction in recent years thanks to advances in strain engineering, leading to an accelerated scaling from rapid prototyping to industrial production. Metabolic engineering is nowadays on track towards a truly manufacturing technology, with reduced times from conception to production enabled by automated protocols for DNA assembly of metabolic pathways in engineered producer strains. In this review, we discuss how the success of the metabolic engineering pipeline often relies on retrobiosynthetic protocols able to identify promising production routes and dynamic regulation strategies through automated biodesign algorithms, which are subsequently assembled as embedded integrated genetic circuits in the host strain. Those approaches are orchestrated by an experimental design strategy that provides optimal scheduling planning of the DNA assembly, rapid prototyping and, ultimately, brings forward an accelerated Design-Build-Test-Learn cycle and the overall optimization of the biomanufacturing process. Achieving such a vision will address the increasingly compelling demand in our society for delivering valuable biomolecules in an affordable, inclusive and sustainable bioeconomy.
Collapse
Affiliation(s)
- Irene Otero-Muras
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo, 36208, Spain.
| | - Pablo Carbonell
- Institute of Industrial Control Systems and Computing (ai2), Universitat Politècnica de València, 46022, Spain.
| |
Collapse
|
9
|
Zwietering MH, Garre A, den Besten HMW. Incorporating strain variability in the design of heat treatments: A stochastic approach and a kinetic approach. Food Res Int 2020; 139:109973. [PMID: 33509519 DOI: 10.1016/j.foodres.2020.109973] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 09/07/2020] [Accepted: 11/28/2020] [Indexed: 12/26/2022]
Abstract
For the design of thermal processes, the decimal reduction times (D-values) of target organisms can be used. However, many factors influence the D-value, like inherent organism's characteristics (strain variability), the effect of the history of the cells, as well as product factors and process factors. Strain variability is a very large contributor to the overall variation of the D-value. Hence, the overall reduction of microbial contaminants by a heat treatment is a combination of the occurrence of a strain with a certain heat resistance and its reduction given the prevailing conditions. This reduction can be determined using two approaches: a kinetic analysis based on integral equations or a stochastic approach based on Monte Carlo analysis. In this article, these two approaches are compared using as case studies the inactivation of two microorganisms: Listeria monocytogenes in a pasteurization process and the sporeformer Geobacillus stearothermophilus in a UHT process. Both approaches resulted in similar conclusions, highlighting that the strains with the highest heat resistance are determinant for the overall inactivation, even if the probability of cells having such extreme heat resistance is very low.
Collapse
Affiliation(s)
- Marcel H Zwietering
- Food Microbiology, Wageningen University, PO Box 17, 6700 AA Wageningen, the Netherlands
| | - Alberto Garre
- Food Microbiology, Wageningen University, PO Box 17, 6700 AA Wageningen, the Netherlands
| | - Heidy M W den Besten
- Food Microbiology, Wageningen University, PO Box 17, 6700 AA Wageningen, the Netherlands.
| |
Collapse
|
10
|
Yazdani A, Lu L, Raissi M, Karniadakis GE. Systems biology informed deep learning for inferring parameters and hidden dynamics. PLoS Comput Biol 2020; 16:e1007575. [PMID: 33206658 PMCID: PMC7710119 DOI: 10.1371/journal.pcbi.1007575] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 12/02/2020] [Accepted: 10/11/2020] [Indexed: 01/23/2023] Open
Abstract
Mathematical models of biological reactions at the system-level lead to a set of ordinary differential equations with many unknown parameters that need to be inferred using relatively few experimental measurements. Having a reliable and robust algorithm for parameter inference and prediction of the hidden dynamics has been one of the core subjects in systems biology, and is the focus of this study. We have developed a new systems-biology-informed deep learning algorithm that incorporates the system of ordinary differential equations into the neural networks. Enforcing these equations effectively adds constraints to the optimization procedure that manifests itself as an imposed structure on the observational data. Using few scattered and noisy measurements, we are able to infer the dynamics of unobserved species, external forcing, and the unknown model parameters. We have successfully tested the algorithm for three different benchmark problems.
Collapse
Affiliation(s)
- Alireza Yazdani
- Division of Applied Mathematics, Brown University, Providence, Rhode Island, USA
| | - Lu Lu
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Maziar Raissi
- Department of Applied Mathematics, University of Colorado, Boulder, Colorado, USA
| | | |
Collapse
|
11
|
Wang X, Rai N, Merchel Piovesan Pereira B, Eetemadi A, Tagkopoulos I. Accelerated knowledge discovery from omics data by optimal experimental design. Nat Commun 2020; 11:5026. [PMID: 33024104 PMCID: PMC7538421 DOI: 10.1038/s41467-020-18785-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 08/27/2020] [Indexed: 12/15/2022] Open
Abstract
How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences. How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. Here, the authors present OPEX, an optimal experimental design method to identify informative omics experiments for both experimental space exploration and model training.
Collapse
Affiliation(s)
- Xiaokang Wang
- Department of Biomedical Engineering, University of California, Davis, CA, 95616, USA.,Genome Center, University of California, Davis, CA, 95616, USA
| | - Navneet Rai
- Genome Center, University of California, Davis, CA, 95616, USA.,Department of Computer Science, University of California, Davis, CA, 95616, USA
| | - Beatriz Merchel Piovesan Pereira
- Genome Center, University of California, Davis, CA, 95616, USA.,Microbiology Graduate Group, University of California, Davis, CA, 95616, USA
| | - Ameen Eetemadi
- Genome Center, University of California, Davis, CA, 95616, USA.,Department of Computer Science, University of California, Davis, CA, 95616, USA
| | - Ilias Tagkopoulos
- Genome Center, University of California, Davis, CA, 95616, USA. .,Department of Computer Science, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
12
|
Optimal experiment design under parametric uncertainty: A comparison of a sensitivities based approach versus a polynomial chaos based stochastic approach. Chem Eng Sci 2020. [DOI: 10.1016/j.ces.2020.115651] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
13
|
Srinivasan S, Cluett WR, Mahadevan R. A scalable method for parameter identification in kinetic models of metabolism using steady-state data. Bioinformatics 2020; 35:5216-5225. [PMID: 31197317 DOI: 10.1093/bioinformatics/btz445] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Revised: 04/26/2019] [Accepted: 06/05/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In kinetic models of metabolism, the parameter values determine the dynamic behaviour predicted by these models. Estimating parameters from in vivo experimental data require the parameters to be structurally identifiable, and the data to be informative enough to estimate these parameters. Existing methods to determine the structural identifiability of parameters in kinetic models of metabolism can only be applied to models of small metabolic networks due to their computational complexity. Additionally, a priori experimental design, a necessity to obtain informative data for parameter estimation, also does not account for using steady-state data to estimate parameters in kinetic models. RESULTS Here, we present a scalable methodology to structurally identify parameters for each flux in a kinetic model of metabolism based on the availability of steady-state data. In doing so, we also address the issue of determining the number and nature of experiments for generating steady-state data to estimate these parameters. By using a small metabolic network as an example, we show that most parameters in fluxes expressed by mechanistic enzyme kinetic rate laws can be identified using steady-state data, and the steady-state data required for their estimation can be obtained from selective experiments involving both substrate and enzyme level perturbations. The methodology can be used in combination with other identifiability and experimental design algorithms that use dynamic data to determine the most informative experiments requiring the least resources to perform. AVAILABILITY AND IMPLEMENTATION https://github.com/LMSE/ident. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shyam Srinivasan
- Department of Chemical Engineering and Applied Chemistry, 200 College Street, University of Toronto, Toronto, ON, M5S3E5, Canada
| | - William R Cluett
- Department of Chemical Engineering and Applied Chemistry, 200 College Street, University of Toronto, Toronto, ON, M5S3E5, Canada
| | - Radhakrishnan Mahadevan
- Department of Chemical Engineering and Applied Chemistry, 200 College Street, University of Toronto, Toronto, ON, M5S3E5, Canada.,Institute of Biomaterials and Biomedical Engineering, 164 College Street, University of Toronto, Toronto, ON, M5S 3G9, Canada
| |
Collapse
|
14
|
On the use of in-silico simulations to support experimental design: A case study in microbial inactivation of foods. PLoS One 2019; 14:e0220683. [PMID: 31454353 PMCID: PMC6711534 DOI: 10.1371/journal.pone.0220683] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Accepted: 07/22/2019] [Indexed: 02/01/2023] Open
Abstract
The mathematical models used in predictive microbiology contain parameters that must be estimated based on experimental data. Due to experimental uncertainty and variability, they cannot be known exactly and must be reported with a measure of uncertainty (usually a standard deviation). In order to increase precision (i.e. reduce the standard deviation), it is usual to add extra sampling points. However, recent studies have shown that precision can also be increased without adding extra sampling points by using Optimal Experiment Design, which applies optimization and information theory to identify the most informative experiment under a set of constraints. Nevertheless, to date, there has been scarce contributions to know a priori whether an experimental design is likely to provide the desired precision in the parameter estimates. In this article, two complementary methodologies to predict the parameter precision for a given experimental design are proposed. Both approaches are based on in silico simulations, so they can be performed before any experimental work. The first one applies Monte Carlo simulations to estimate the standard deviation of the model parameters, whereas the second one applies the properties of the Fisher Information Matrix to estimate the volume of the confidence ellipsoids. The application of these methods to a case study of dynamic microbial inactivation, showing how they can be used to compare experimental designs and assess their precision, is illustrated. The results show that, as expected, the optimal experimental design is more accurate than the uniform design with the same number of data points. Furthermore, it is demonstrated that, for some heating profiles, the uniform design does not ensure that a higher number of sampling points increases precision. Therefore, optimal experimental designs are highly recommended in predictive microbiology.
Collapse
|
15
|
Lee D, Jayaraman A, Sang-Il Kwon J. Identification of a time-varying intracellular signalling model through data clustering and parameter selection: application to NF-[inline-formula removed]B signalling pathway induced by LPS in the presence of BFA. IET Syst Biol 2019; 13:169-179. [PMID: 31318334 PMCID: PMC8687386 DOI: 10.1049/iet-syb.2018.5079] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 02/07/2019] [Accepted: 02/14/2019] [Indexed: 01/02/2023] Open
Abstract
Developing a model for a signalling pathway requires several iterations of experimentation and model refinement to obtain an accurate model. However, the implementation of such an approach to model a signalling pathway induced by a poorly-known stimulus can become labour intensive because only limited information on the pathway is available beforehand to formulate an initial model. Therefore, a large number of iterations are required since the initial model is likely to be erroneous. In this work, a numerical scheme is proposed to construct a time-varying model for a signalling pathway induced by a poorly-known stimulus when its nominal model is available in the literature. Here, the nominal model refers to one that describes the signalling dynamics under a well-characterised stimulus. First, global sensitivity analysis is implemented on the nominal model to identify the most important parameters, which are assumed to be piecewise constants. Second, measurement data are clustered to determine temporal subdomains where the parameters take different values. Finally, a least-squares problem is solved to estimate the parameter values in each temporal subdomain. The effectiveness of this approach is illustrated by developing a time-varying model for NF-[inline-formula removed]B signalling dynamics induced by lipopolysaccharide in the presence of brefeldin A.
Collapse
Affiliation(s)
- Dongheon Lee
- Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843, USA
| | - Arul Jayaraman
- Department of Biomedical Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Joseph Sang-Il Kwon
- Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
16
|
Manheim DC, Detwiler RL. Accurate and reliable estimation of kinetic parameters for environmental engineering applications: A global, multi objective, Bayesian optimization approach. MethodsX 2019; 6:1398-1414. [PMID: 31245280 PMCID: PMC6582191 DOI: 10.1016/j.mex.2019.05.035] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 05/30/2019] [Indexed: 11/16/2022] Open
Abstract
Accurate and reliable predictions of bacterial growth and metabolism from unstructured kinetic models are critical to the proper operation and design of engineered biological treatment and remediation systems. As such, parameter estimation has progressed into a routine challenge in the field of Environmental Engineering. Among the main issues identified with parameter estimation, the model-data calibration approach is a crucial, yet an often overlooked and difficult optimization problem. Here, a novel and rigorous global, multi objective, and fully Bayesian optimization approach that overcomes challenges associated with multi-variate, sparse and noisy data, as well as highly non-linear model structures commonly encountered in Environmental Engineering practice is presented. This optimization approach allows an improved definition and targeting of the compromise solution space for all multivariate problems, allowing efficient convergence, and a Bayesian component to thoroughly explore parameter and model prediction uncertainty. This global optimization approach outperformed, in terms of parameter accuracy and precision, standard, local non-linear regression routines and overcomes issues associated with premature convergence and addresses overfitting of different variables in the calibration process. •A sequential single, multi-objective, and Bayesian optimization workflow was developed to accurately and reliably estimate unstructured kinetic model parameters.•The global, single objective approach defines the global optimum (the best compromise solution) and "extreme" parameter solutions for each variable, while the global, multi-objective approach confirms the "best" compromise solution space for the Bayesian search to target and convergence is assessed using the single objective results.•The Approximate Bayesian Computational approach fully explores parameter and model prediction uncertainty targeting the compromise solution space previously identified.
Collapse
Affiliation(s)
- Derek C Manheim
- Department of Civil and Environmental Engineering, University of California Irvine, United States
| | - Russell L Detwiler
- Department of Civil and Environmental Engineering, University of California Irvine, United States
| |
Collapse
|
17
|
Component Characterization in a Growth-Dependent Physiological Context: Optimal Experimental Design. Processes (Basel) 2019. [DOI: 10.3390/pr7010052] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Synthetic biology design challenges have driven the use of mathematical models to characterize genetic components and to explore complex design spaces. Traditional approaches to characterization have largely ignored the effect of strain and growth conditions on the dynamics of synthetic genetic circuits, and have thus confounded intrinsic features of the circuit components with cell-level context effects. We present a model that distinguishes an activated gene’s intrinsic kinetics from its physiological context. We then demonstrate an optimal experimental design approach to identify dynamic induction experiments for efficient estimation of the component’s intrinsic parameters. Maximally informative experiments are chosen by formulating the design as an optimal control problem; direct multiple-shooting is used to identify the optimum. Our numerical results suggest that the intrinsic parameters of a genetic component can be more accurately estimated using optimal experimental designs, and that the choice of growth rates, sampling schedule, and input profile each play an important role. The proposed approach to coupled component–host modelling can support gene circuit design across a range of physiological conditions.
Collapse
|
18
|
Comprehensive experimental design for chemical engineering processes: A two-layer iterative design approach. Chem Eng Sci 2018. [DOI: 10.1016/j.ces.2018.05.047] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Nöh K, Niedenführ S, Beyß M, Wiechert W. A Pareto approach to resolve the conflict between information gain and experimental costs: Multiple-criteria design of carbon labeling experiments. PLoS Comput Biol 2018; 14:e1006533. [PMID: 30379837 PMCID: PMC6209137 DOI: 10.1371/journal.pcbi.1006533] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 09/27/2018] [Indexed: 01/23/2023] Open
Abstract
Science revolves around the best way of conducting an experiment to obtain insightful results. Experiments with maximal information content can be found by computational experimental design (ED) strategies that identify optimal conditions under which to perform the experiment. Several criteria have been proposed to measure the information content, each emphasizing different aspects of the design goal, i.e., reduction of uncertainty. Where experiments are complex or expensive, second sight is at the budget governing the achievable amount of information. In this context, the design objectives cost and information gain are often incommensurable, though dependent. By casting the ED task into a multiple-criteria optimization problem, a set of trade-off designs is derived that approximates the Pareto-frontier which is instrumental for exploring preferable designs. In this work, we present a computational methodology for multiple-criteria ED of information-rich experiments that accounts for virtually any set of design criteria. The methodology is implemented for the case of 13C metabolic flux analysis (MFA), which is arguably the most expensive type among the ‘omics’ technologies, featuring dozens of design parameters (tracer composition, analytical platform, measurement selection etc.). Supported by an innovative visualization scheme, we demonstrate with two realistic showcases that the use of multiple criteria reveals deep insights into the conflicting interplay between information carriers and cost factors that are not amendable to single-objective ED. For instance, tandem mass spectrometry turns out as best-in-class with respect to information gain, while it delivers this information quality cheaper than the other, routinely applied analytical technologies. Therewith, our Pareto approach to ED offers the investigator great flexibilities in the conception phase of a study to balance costs and benefits. Designing experiments is obligatory in the biosciences to valorize their scientific outcome. When the experiments are expensive, unfortunately, in practice often the costs emerge to be showstoppers. In this situation the question arises: How to get the most out of the experiment for your invest in terms of time and money? We approach this question by formulating the design task as a multiple-criteria optimization problem. Its solution produces a set of Pareto-optimal design proposals that feature the trade-off between information gain, as measured by different metrics, and the costs. Then, exploration of the design proposals allows us to make the best decision on information-economic experiments under given circumstances. Implemented in the field of isotope-based metabolic flux analysis, practical application of the Pareto approach provides detailed insight into the tight interplay of plenty of information carriers and cost factors. Supported by an innovative tailored visual representation scheme, the investigator is enabled to explore the options before conducting the experiment. With a practical showcase at hand, our computational study highlights the benefits of incorporating multiple information criteria apart from the costs, balancing the shortcomings of conventional single-objective experimental design strategies.
Collapse
Affiliation(s)
- Katharina Nöh
- Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, Jülich, Germany
- * E-mail:
| | - Sebastian Niedenführ
- Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, Jülich, Germany
| | - Martin Beyß
- Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, Jülich, Germany
| | - Wolfgang Wiechert
- Institute of Bio- and Geosciences, IBG-1: Biotechnology, Forschungszentrum Jülich GmbH, Jülich, Germany
- Computational Systems Biotechnology, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
20
|
On-Line Optimal Input Design Increases the Efficiency and Accuracy of the Modelling of an Inducible Synthetic Promoter. Processes (Basel) 2018. [DOI: 10.3390/pr6090148] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Synthetic biology seeks to design biological parts and circuits that implement new functions in cells. Major accomplishments have been reported in this field, yet predicting a priori the in vivo behaviour of synthetic gene circuits is major a challenge. Mathematical models offer a means to address this bottleneck. However, in biology, modelling is perceived as an expensive, time-consuming task. Indeed, the quality of predictions depends on the accuracy of parameters, which are traditionally inferred from poorly informative data. How much can parameter accuracy be improved by using model-based optimal experimental design (MBOED)? To tackle this question, we considered an inducible promoter in the yeast S. cerevisiae. Using in vivo data, we re-fit a dynamic model for this component and then compared the performance of standard (e.g., step inputs) and optimally designed experiments for parameter inference. We found that MBOED improves the quality of model calibration by ∼60%. Results further improve up to 84 % when considering on-line optimal experimental design (OED). Our in silico results suggest that MBOED provides a significant advantage in the identification of models of biological parts and should thus be integrated into their characterisation.
Collapse
|
21
|
New opportunities for optimal design of dynamic experiments in systems and synthetic biology. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.coisb.2018.02.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
22
|
Madi MK, Karameh FN. Adaptive optimal input design and parametric estimation of nonlinear dynamical systems: application to neuronal modeling. J Neural Eng 2018; 15:046028. [PMID: 29749350 DOI: 10.1088/1741-2552/aac3f7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Many physical models of biological processes including neural systems are characterized by parametric nonlinear dynamical relations between driving inputs, internal states, and measured outputs of the process. Fitting such models using experimental data (data assimilation) is a challenging task since the physical process often operates in a noisy, possibly non-stationary environment; moreover, conducting multiple experiments under controlled and repeatable conditions can be impractical, time consuming or costly. The accuracy of model identification, therefore, is dictated principally by the quality and dynamic richness of collected data over single or few experimental sessions. Accordingly, it is highly desirable to design efficient experiments that, by exciting the physical process with smart inputs, yields fast convergence and increased accuracy of the model. APPROACH We herein introduce an adaptive framework in which optimal input design is integrated with square root cubature Kalman filters (OID-SCKF) to develop an online estimation procedure that first, converges significantly quicker, thereby permitting model fitting over shorter time windows, and second, enhances model accuracy when only few process outputs are accessible. The methodology is demonstrated on common nonlinear models and on a four-area neural mass model with noisy and limited measurements. Estimation quality (speed and accuracy) is benchmarked against high-performance SCKF-based methods that commonly employ dynamically rich informed inputs for accurate model identification. MAIN RESULTS For all the tested models, simulated single-trial and ensemble averages showed that OID-SCKF exhibited (i) faster convergence of parameter estimates and (ii) lower dependence on inter-trial noise variability with gains up to around 1000 ms in speed and 81% increase in variability for the neural mass models. In terms of accuracy, OID-SCKF estimation was superior, and exhibited considerably less variability across experiments, in identifying model parameters of (a) systems with challenging model inversion dynamics and (b) systems with fewer measurable outputs that directly relate to the underlying processes. SIGNIFICANCE Fast and accurate identification therefore carries particular promise for modeling of transient (short-lived) neuronal network dynamics using a spatially under-sampled set of noisy measurements, as is commonly encountered in neural engineering applications.
Collapse
Affiliation(s)
- Mahmoud K Madi
- Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon
| | | |
Collapse
|
23
|
Thiele S, Heise S, Hessenkemper W, Bongartz H, Fensky M, Schaper F, Klamt S. Designing optimal experiments to discriminate interaction graph models. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:925-935. [PMID: 29993657 DOI: 10.1109/tcbb.2018.2812184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Modern methods for the inference of cellular networks from experimental data often express nondeterminism through an ensemble of candidate models. To discriminate among these candidates new experiments need to be carried out. Theoretically, the number of possible experiments is exponential in the number of possible perturbations. In praxis, experiments are expensive and there exist several limiting constraints. Limiting factors exist on the combinations of perturbations that are technically possible, which components can be measured, and on the number of affordable experiments. Further, not all experiments are equally well suited to discriminate model candidates. The goal of optimal experiment design is to determine those experiments that discriminate most of the candidates while minimizing the costs. We present an approach for experiment planning with interaction graph models and sign consistency methods. This new approach can be used in combination with methods for network inference and consistency checking. We applied our method to study the Erythropoietin signal transduction in human kidney cells HEK293. We first used simulated experiment data from an ODE model to demonstrate in silico that our experimental design results in the inference of the gold standard model. Finally, we used the approach to plan in vivo experiments that discriminate model candidates for the Erythropoietin signal transduction in this cell line.
Collapse
|
24
|
Mohsenizadeh DN, Dehghannasiri R, Dougherty ER. Optimal Objective-Based Experimental Design for Uncertain Dynamical Gene Networks with Experimental Error. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:218-230. [PMID: 27576263 PMCID: PMC5845823 DOI: 10.1109/tcbb.2016.2602873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
In systems biology, network models are often used to study interactions among cellular components, a salient aim being to develop drugs and therapeutic mechanisms to change the dynamical behavior of the network to avoid undesirable phenotypes. Owing to limited knowledge, model uncertainty is commonplace and network dynamics can be updated in different ways, thereby giving multiple dynamic trajectories, that is, dynamics uncertainty. In this manuscript, we propose an experimental design method that can effectively reduce the dynamics uncertainty and improve performance in an interaction-based network. Both dynamics uncertainty and experimental error are quantified with respect to the modeling objective, herein, therapeutic intervention. The aim of experimental design is to select among a set of candidate experiments the experiment whose outcome, when applied to the network model, maximally reduces the dynamics uncertainty pertinent to the intervention objective.
Collapse
|
25
|
Wang X, Sun B, Liu B, Fu Y, Zheng P. A novel method for multifactorial bio-chemical experiments design based on combinational design theory. PLoS One 2017; 12:e0186853. [PMID: 29095845 PMCID: PMC5667848 DOI: 10.1371/journal.pone.0186853] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 10/09/2017] [Indexed: 11/19/2022] Open
Abstract
Experimental design focuses on describing or explaining the multifactorial interactions that are hypothesized to reflect the variation. The design introduces conditions that may directly affect the variation, where particular conditions are purposely selected for observation. Combinatorial design theory deals with the existence, construction and properties of systems of finite sets whose arrangements satisfy generalized concepts of balance and/or symmetry. In this work, borrowing the concept of "balance" in combinatorial design theory, a novel method for multifactorial bio-chemical experiments design is proposed, where balanced templates in combinational design are used to select the conditions for observation. Balanced experimental data that covers all the influencing factors of experiments can be obtianed for further processing, such as training set for machine learning models. Finally, a software based on the proposed method is developed for designing experiments with covering influencing factors a certain number of times.
Collapse
Affiliation(s)
- Xun Wang
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Beibei Sun
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Boyang Liu
- State-owned Asset and Laboratory Management Department, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Yaping Fu
- Institute of Complexity Science, Qingdao University, Qingdao 266071, Shandong, China
- * E-mail: (YF); (PZ)
| | - Pan Zheng
- Faculty of Engineering, Computing and Science, Swinburne University of Technology Sarawak Campus, Kuching 93350, Malaysia
- * E-mail: (YF); (PZ)
| |
Collapse
|
26
|
Multi-Objective Optimization of Experiments Using Curvature and Fisher Information Matrix. Processes (Basel) 2017. [DOI: 10.3390/pr5040063] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
27
|
Optimal Experimental Design for Parameter Estimation of an IL-6 Signaling Model. Processes (Basel) 2017. [DOI: 10.3390/pr5030049] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
28
|
Oguz C, Watson LT, Baumann WT, Tyson JJ. Predicting network modules of cell cycle regulators using relative protein abundance statistics. BMC SYSTEMS BIOLOGY 2017; 11:30. [PMID: 28241833 PMCID: PMC5329933 DOI: 10.1186/s12918-017-0409-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 02/17/2017] [Indexed: 11/10/2022]
Abstract
BACKGROUND Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network. RESULTS Using differential evolution, we generate an ensemble of feasible parameter vectors that reproduce the phenotypes (viable or inviable) of wild-type yeast cells and 110 mutant strains. We use this ensemble to predict the phenotypes of 129 mutant strains for which experimental data is not available. We identify 86 novel mutants that are predicted to be viable and then rank the cell cycle proteins in terms of their contributions to cumulative variability of relative protein abundance predictions. Proteins involved in "regulation of cell size" and "regulation of G1/S transition" contribute most to predictive variability, whereas proteins involved in "positive regulation of transcription involved in exit from mitosis," "mitotic spindle assembly checkpoint" and "negative regulation of cyclin-dependent protein kinase by cyclin degradation" contribute the least. These results suggest that the statistics of these predictions may be generating patterns specific to individual network modules (START, S/G2/M, and EXIT). To test this hypothesis, we develop random forest models for predicting the network modules of cell cycle regulators using relative abundance statistics as model inputs. Predictive performance is assessed by the areas under receiver operating characteristics curves (AUC). Our models generate an AUC range of 0.83-0.87 as opposed to randomized models with AUC values around 0.50. CONCLUSIONS By using differential evolution and random forest modeling, we show that the model prediction statistics generate distinct network module-specific patterns within the cell cycle network.
Collapse
Affiliation(s)
- Cihan Oguz
- Department of Biological Sciences, Virginia Tech, Blacksburg VA, 24061, USA.
| | - Layne T Watson
- Department of Computer Science, Virginia Tech, Blacksburg VA, 24061, USA.,Department of Mathematics, Virginia Tech, Blacksburg VA, 24061, USA.,Department of Aerospace and Ocean Engineering, Virginia Tech, Blacksburg VA, 24061, USA
| | - William T Baumann
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg VA, 24061, USA
| | - John J Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg VA, 24061, USA
| |
Collapse
|
29
|
Cao HT, Gibson TE, Bashan A, Liu YY. Inferring human microbial dynamics from temporal metagenomics data: Pitfalls and lessons. Bioessays 2016; 39. [PMID: 28000336 DOI: 10.1002/bies.201600188] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The human gut microbiota is a very complex and dynamic ecosystem that plays a crucial role in health and well-being. Inferring microbial community structure and dynamics directly from time-resolved metagenomics data is key to understanding the community ecology and predicting its temporal behavior. Many methods have been proposed to perform the inference. Yet, as we point out in this review, there are several pitfalls along the way. Indeed, the uninformative temporal measurements and the compositional nature of the relative abundance data raise serious challenges in inference. Moreover, the inference results can be largely distorted when only focusing on highly abundant species by ignoring or grouping low-abundance species. Finally, the implicit assumptions in various regularization methods may not reflect reality. Those issues have to be seriously considered in ecological modeling of human gut microbiota.
Collapse
Affiliation(s)
- Hong-Tai Cao
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA.,Chu Kochen Honors College, College of Electrical Engineering, Zhejiang University, Hangzhou, Zhejiang, China
| | - Travis E Gibson
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Amir Bashan
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Physics, Bar-Ilan University, Ramat-Gan, Israel
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
30
|
White A, Tolman M, Thames HD, Withers HR, Mason KA, Transtrum MK. The Limitations of Model-Based Experimental Design and Parameter Estimation in Sloppy Systems. PLoS Comput Biol 2016; 12:e1005227. [PMID: 27923060 PMCID: PMC5140062 DOI: 10.1371/journal.pcbi.1005227] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Accepted: 10/27/2016] [Indexed: 12/15/2022] Open
Abstract
We explore the relationship among experimental design, parameter estimation, and systematic error in sloppy models. We show that the approximate nature of mathematical models poses challenges for experimental design in sloppy models. In many models of complex biological processes it is unknown what are the relevant physical mechanisms that must be included to explain system behaviors. As a consequence, models are often overly complex, with many practically unidentifiable parameters. Furthermore, which mechanisms are relevant/irrelevant vary among experiments. By selecting complementary experiments, experimental design may inadvertently make details that were ommitted from the model become relevant. When this occurs, the model will have a large systematic error and fail to give a good fit to the data. We use a simple hyper-model of model error to quantify a model’s discrepancy and apply it to two models of complex biological processes (EGFR signaling and DNA repair) with optimally selected experiments. We find that although parameters may be accurately estimated, the discrepancy in the model renders it less predictive than it was in the sloppy regime where systematic error is small. We introduce the concept of a sloppy system–a sequence of models of increasing complexity that become sloppy in the limit of microscopic accuracy. We explore the limits of accurate parameter estimation in sloppy systems and argue that identifying underlying mechanisms controlling system behavior is better approached by considering a hierarchy of models of varying detail rather than focusing on parameter estimation in a single model. Sloppy models are often unidentifiable, i.e., characterized by many parameters that are poorly constrained by experimental data. Many models of complex biological systems are sloppy, which has prompted considerable debate about the identifiability of parameters and methods of selecting optimal experiments to infer parameter values. We explore how the approximate nature of models affects the prospect for accurate parameter estimates and model predictivity in sloppy models when using optimal experimental design. We find that sloppy models may no longer give a good fit to data generated from “optimal” experiments. In this case, the model has much less predictive power than it did before optimal experimental selection. We use a simple hyper-model of model error to quantify the model’s discrepancy from the physical system and discuss the potential limits of accurate parameter estimation in sloppy systems.
Collapse
Affiliation(s)
- Andrew White
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, United States of America
| | - Malachi Tolman
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, United States of America
| | - Howard D. Thames
- Department of Biostatistics, UT MD Anderson Cancer Center, Houston, Texas, United States of America
- Department of Experimental Radiation Oncology, UT MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Hubert Rodney Withers
- Department of Experimental Radiation Oncology, UT MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Kathy A. Mason
- Department of Experimental Radiation Oncology, UT MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Mark K. Transtrum
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, United States of America
- * E-mail:
| |
Collapse
|
31
|
On the relationship between sloppiness and identifiability. Math Biosci 2016; 282:147-161. [DOI: 10.1016/j.mbs.2016.10.009] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Revised: 10/21/2016] [Accepted: 10/23/2016] [Indexed: 01/15/2023]
|
32
|
Transtrum MK, Qiu P. Bridging Mechanistic and Phenomenological Models of Complex Biological Systems. PLoS Comput Biol 2016; 12:e1004915. [PMID: 27187545 PMCID: PMC4871498 DOI: 10.1371/journal.pcbi.1004915] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 04/13/2016] [Indexed: 01/12/2023] Open
Abstract
The inherent complexity of biological systems gives rise to complicated mechanistic models with a large number of parameters. On the other hand, the collective behavior of these systems can often be characterized by a relatively small number of phenomenological parameters. We use the Manifold Boundary Approximation Method (MBAM) as a tool for deriving simple phenomenological models from complicated mechanistic models. The resulting models are not black boxes, but remain expressed in terms of the microscopic parameters. In this way, we explicitly connect the macroscopic and microscopic descriptions, characterize the equivalence class of distinct systems exhibiting the same range of collective behavior, and identify the combinations of components that function as tunable control knobs for the behavior. We demonstrate the procedure for adaptation behavior exhibited by the EGFR pathway. From a 48 parameter mechanistic model, the system can be effectively described by a single adaptation parameter τ characterizing the ratio of time scales for the initial response and recovery time of the system which can in turn be expressed as a combination of microscopic reaction rates, Michaelis-Menten constants, and biochemical concentrations. The situation is not unlike modeling in physics in which microscopically complex processes can often be renormalized into simple phenomenological models with only a few effective parameters. The proposed method additionally provides a mechanistic explanation for non-universal features of the behavior.
Collapse
Affiliation(s)
- Mark K. Transtrum
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah, United States of America
- * E-mail:
| | - Peng Qiu
- Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, Georgia, United States of America
| |
Collapse
|
33
|
Webb JM, Smucker BJ, Bailer AJ. Selecting the best design for nonstandard toxicology experiments. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2014; 33:2399-2406. [PMID: 24943385 DOI: 10.1002/etc.2671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 03/05/2014] [Accepted: 06/16/2014] [Indexed: 06/03/2023]
Abstract
Although many experiments in environmental toxicology use standard statistical experimental designs, there are situations that arise where no such standard design is natural or applicable because of logistical constraints. For example, the layout of a laboratory may suggest that each shelf serve as a block, with the number of experimental units per shelf either greater than or less than the number of treatments in a way that precludes the use of a typical block design. In such cases, an effective and powerful alternative is to employ optimal experimental design principles, a strategy that produces designs with precise statistical estimates. Here, a D-optimal design was generated for an experiment in environmental toxicology that has 2 factors, 16 treatments, and constraints similar to those described above. After initial consideration of a randomized complete block design and an intuitive cyclic design, it was decided to compare a D-optimal design and a slightly more complicated version of the cyclic design. Simulations were conducted generating random responses under a variety of scenarios that reflect conditions motivated by a similar toxicology study, and the designs were evaluated via D-efficiency as well as by a power analysis. The cyclic design performed well compared to the D-optimal design.
Collapse
Affiliation(s)
- Jennifer M Webb
- Department of Statistics, Miami University, Oxford, Ohio, USA
| | | | | |
Collapse
|
34
|
Pauwels E, Lajaunie C, Vert JP. A Bayesian active learning strategy for sequential experimental design in systems biology. BMC SYSTEMS BIOLOGY 2014; 8:102. [PMID: 25256134 PMCID: PMC4181721 DOI: 10.1186/s12918-014-0102-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Accepted: 08/14/2014] [Indexed: 11/23/2022]
Abstract
BackgroundDynamical models used in systems biology involve unknown kinetic parameters. Setting these parameters is a bottleneck in many modeling projects. This motivates the estimation of these parameters from empirical data. However, this estimation problem has its own difficulties, the most important one being strong ill-conditionedness. In this context, optimizing experiments to be conducted in order to better estimate a system¿s parameters provides a promising direction to alleviate the difficulty of the task.ResultsBorrowing ideas from Bayesian experimental design and active learning, we propose a new strategy for optimal experimental design in the context of kinetic parameter estimation in systems biology. We describe algorithmic choices that allow to implement this method in a computationally tractable way and make it fully automatic. Based on simulation, we show that it outperforms alternative baseline strategies, and demonstrate the benefit to consider multiple posterior modes of the likelihood landscape, as opposed to traditional schemes based on local and Gaussian approximations.ConclusionThis analysis demonstrates that our new, fully automatic Bayesian optimal experimental design strategy has the potential to support the design of experiments for kinetic parameter estimation in systems biology.
Collapse
Affiliation(s)
- Edouard Pauwels
- />CNRS, LAAS, 7 Avenue du Colonel Roche, Toulouse, F-31400 France
- />Université de Toulouse LAAS, Toulouse, F-31400 France
| | - Christian Lajaunie
- />MINES ParisTech, PSL-Research University, CBIO-Centre for Computational Biology, 35 rue Saint-Honoré, Fontainebleau, 77300 France
- />Institut Curie, 26 rue d’Ulm, F-75248, Paris, France
- />INSERM U900, Paris, F-75248 France
| | - Jean-Philippe Vert
- />MINES ParisTech, PSL-Research University, CBIO-Centre for Computational Biology, 35 rue Saint-Honoré, Fontainebleau, 77300 France
- />Institut Curie, 26 rue d’Ulm, F-75248, Paris, France
- />INSERM U900, Paris, F-75248 France
| |
Collapse
|
35
|
Abstract
MOTIVATION A holy grail of biological research is a working model of the cell. Current modeling frameworks, especially in the protein-protein interaction domain, are mostly topological in nature, calling for stronger and more expressive network models. One promising alternative is logic-based or Boolean network modeling, which was successfully applied to model signaling regulatory circuits in human. Learning such models requires observing the system under a sufficient number of different conditions. To date, the amount of measured data is the main bottleneck in learning informative Boolean models, underscoring the need for efficient experimental design strategies. RESULTS We developed novel design approaches that greedily select an experiment to be performed so as to maximize the difference or the entropy in the results it induces with respect to current best-fit models. Unique to our maximum difference approach is the ability to account for all (possibly exponential number of) Boolean models displaying high fit to the available data. We applied both approaches to simulated and real data from the EFGR and IL1 signaling systems in human. We demonstrate the utility of the developed strategies in substantially improving on a random selection approach. Our design schemes highlight the redundancy in these datasets, leading up to 11-fold savings in the number of experiments to be performed. AVAILABILITY AND IMPLEMENTATION Source code will be made available upon acceptance of the manuscript.
Collapse
Affiliation(s)
- Nir Atias
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Michal Gershenzon
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Katia Labazin
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
36
|
Tönsing C, Timmer J, Kreutz C. Cause and cure of sloppiness in ordinary differential equation models. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 90:023303. [PMID: 25215847 DOI: 10.1103/physreve.90.023303] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Indexed: 06/03/2023]
Abstract
Data-based mathematical modeling of biochemical reaction networks, e.g., by nonlinear ordinary differential equation (ODE) models, has been successfully applied. In this context, parameter estimation and uncertainty analysis is a major task in order to assess the quality of the description of the system by the model. Recently, a broadened eigenvalue spectrum of the Hessian matrix of the objective function covering orders of magnitudes was observed and has been termed as sloppiness. In this work, we investigate the origin of sloppiness from structures in the sensitivity matrix arising from the properties of the model topology and the experimental design. Furthermore, we present strategies using optimal experimental design methods in order to circumvent the sloppiness issue and present nonsloppy designs for a benchmark model.
Collapse
Affiliation(s)
- Christian Tönsing
- Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| | - Jens Timmer
- Institute of Physics, University of Freiburg, 79104 Freiburg, Germany and BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79104 Freiburg, Germany
| | - Clemens Kreutz
- Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
37
|
Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014; 11:20130505. [PMID: 24307566 PMCID: PMC3869153 DOI: 10.1098/rsif.2013.0505] [Citation(s) in RCA: 163] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/12/2013] [Indexed: 12/17/2022] Open
Abstract
The interplay of mathematical modelling with experiments is one of the central elements in systems biology. The aim of reverse engineering is to infer, analyse and understand, through this interplay, the functional and regulatory mechanisms of biological systems. Reverse engineering is not exclusive of systems biology and has been studied in different areas, such as inverse problem theory, machine learning, nonlinear physics, (bio)chemical kinetics, control theory and optimization, among others. However, it seems that many of these areas have been relatively closed to outsiders. In this contribution, we aim to compare and highlight the different perspectives and contributions from these fields, with emphasis on two key questions: (i) why are reverse engineering problems so hard to solve, and (ii) what methods are available for the particular problems arising from systems biology?
Collapse
Affiliation(s)
| | - Julio R. Banga
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo 36208, Spain
| |
Collapse
|
38
|
Tummler K, Lubitz T, Schelker M, Klipp E. New types of experimental data shape the use of enzyme kinetics for dynamic network modeling. FEBS J 2013; 281:549-71. [PMID: 24034816 DOI: 10.1111/febs.12525] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Revised: 08/27/2013] [Accepted: 09/10/2013] [Indexed: 01/21/2023]
Abstract
Since the publication of Leonor Michaelis and Maude Menten's paper on the reaction kinetics of the enzyme invertase in 1913, molecular biology has evolved tremendously. New measurement techniques allow in vivo characterization of the whole genome, proteome or transcriptome of cells, whereas the classical enzyme essay only allows determination of the two Michaelis-Menten parameters V and K(m). Nevertheless, Michaelis-Menten kinetics are still commonly used, not only in the in vitro context of enzyme characterization but also as a rate law for enzymatic reactions in larger biochemical reaction networks. In this review, we give an overview of the historical development of kinetic rate laws originating from Michaelis-Menten kinetics over the past 100 years. Furthermore, we briefly summarize the experimental techniques used for the characterization of enzymes, and discuss web resources that systematically store kinetic parameters and related information. Finally, describe the novel opportunities that arise from using these data in dynamic mathematical modeling. In this framework, traditional in vitro approaches may be combined with modern genome-scale measurements to foster thorough understanding of the underlying complex mechanisms.
Collapse
Affiliation(s)
- Katja Tummler
- Theoretical Biophysics, Humboldt-Universität zu Berlin, Germany
| | | | | | | |
Collapse
|
39
|
Becker K, Balsa-Canto E, Cicin-Sain D, Hoermann A, Janssens H, Banga JR, Jaeger J. Reverse-engineering post-transcriptional regulation of gap genes in Drosophila melanogaster. PLoS Comput Biol 2013; 9:e1003281. [PMID: 24204230 PMCID: PMC3814631 DOI: 10.1371/journal.pcbi.1003281] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 09/02/2013] [Indexed: 12/19/2022] Open
Abstract
Systems biology proceeds through repeated cycles of experiment and modeling. One way to implement this is reverse engineering, where models are fit to data to infer and analyse regulatory mechanisms. This requires rigorous methods to determine whether model parameters can be properly identified. Applying such methods in a complex biological context remains challenging. We use reverse engineering to study post-transcriptional regulation in pattern formation. As a case study, we analyse expression of the gap genes Krüppel, knirps, and giant in Drosophila melanogaster. We use detailed, quantitative datasets of gap gene mRNA and protein expression to solve and fit a model of post-transcriptional regulation, and establish its structural and practical identifiability. Our results demonstrate that post-transcriptional regulation is not required for patterning in this system, but is necessary for proper control of protein levels. Our work demonstrates that the uniqueness and specificity of a fitted model can be rigorously determined in the context of spatio-temporal pattern formation. This greatly increases the potential of reverse engineering for the study of development and other, similarly complex, biological processes. The analysis of pattern-forming gene networks is largely focussed on transcriptional regulation. However, post-transcriptional events, such as translation and regulation of protein stability also play important roles in the establishment of protein expression patterns and levels. In this study, we use a reverse-engineering approach—fitting mathematical models to quantitative expression data—to analyse post-transcriptional regulation of the Drosophila gap genes Krüppel, knirps and giant, involved in segment determination during early embryogenesis. Rigorous fitting requires us to establish whether our models provide a robust and unique solution. We demonstrate, for the first time, that this can be done in the context of a complex spatio-temporal regulatory system. This is an important methodological advance for reverse-engineering developmental processes. Our results indicate that post-transcriptional regulation is not required for pattern formation, but is necessary for proper regulation of gap protein levels. Specifically, we predict that translation rates must be tuned for rapid early accumulation, and protein stability must be increased for persistence of high protein levels at late stages of gap gene expression.
Collapse
Affiliation(s)
- Kolja Becker
- EMBL/CRG Research Unit in Systems Biology, Centre de Regulació Genòmica, and Universitat Pombeu Fabra (UPF), Barcelona, Spain
- Institute of Genetics, Johannes Gutenberg University, Mainz, Germany
| | | | - Damjan Cicin-Sain
- EMBL/CRG Research Unit in Systems Biology, Centre de Regulació Genòmica, and Universitat Pombeu Fabra (UPF), Barcelona, Spain
| | - Astrid Hoermann
- EMBL/CRG Research Unit in Systems Biology, Centre de Regulació Genòmica, and Universitat Pombeu Fabra (UPF), Barcelona, Spain
| | - Hilde Janssens
- EMBL/CRG Research Unit in Systems Biology, Centre de Regulació Genòmica, and Universitat Pombeu Fabra (UPF), Barcelona, Spain
| | | | - Johannes Jaeger
- EMBL/CRG Research Unit in Systems Biology, Centre de Regulació Genòmica, and Universitat Pombeu Fabra (UPF), Barcelona, Spain
- * E-mail:
| |
Collapse
|
40
|
Busetto AG, Hauser A, Krummenacher G, Sunnåker M, Dimopoulos S, Ong CS, Stelling J, Buhmann JM. Near-optimal experimental design for model selection in systems biology. Bioinformatics 2013; 29:2625-32. [PMID: 23900189 PMCID: PMC3789540 DOI: 10.1093/bioinformatics/btt436] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Revised: 07/10/2013] [Accepted: 07/24/2013] [Indexed: 12/02/2022] Open
Abstract
MOTIVATION Biological systems are understood through iterations of modeling and experimentation. Not all experiments, however, are equally valuable for predictive modeling. This study introduces an efficient method for experimental design aimed at selecting dynamical models from data. Motivated by biological applications, the method enables the design of crucial experiments: it determines a highly informative selection of measurement readouts and time points. RESULTS We demonstrate formal guarantees of design efficiency on the basis of previous results. By reducing our task to the setting of graphical models, we prove that the method finds a near-optimal design selection with a polynomial number of evaluations. Moreover, the method exhibits the best polynomial-complexity constant approximation factor, unless P = NP. We measure the performance of the method in comparison with established alternatives, such as ensemble non-centrality, on example models of different complexity. Efficient design accelerates the loop between modeling and experimentation: it enables the inference of complex mechanisms, such as those controlling central metabolic operation. AVAILABILITY Toolbox 'NearOED' available with source code under GPL on the Machine Learning Open Source Software Web site (mloss.org).
Collapse
Affiliation(s)
- Alberto Giovanni Busetto
- Department of Computer Science, ETH Zurich, Competence Center for Systems Physiology and Metabolic Diseases, Department of Mathematics, ETH Zurich, Department of Biosystems Science and Engineering, ETH Zurich, Swiss Institute of Bioinformatics, Zurich, Switzerland and National ICT Australia, Melbourne, Australia
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Abdullah A, Deris S, Mohamad MS, Anwar S. An improved swarm optimization for parameter estimation and biological model selection. PLoS One 2013; 8:e61258. [PMID: 23593445 PMCID: PMC3623867 DOI: 10.1371/journal.pone.0061258] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2012] [Accepted: 03/11/2013] [Indexed: 11/19/2022] Open
Abstract
One of the key aspects of computational systems biology is the investigation on the dynamic biological processes within cells. Computational models are often required to elucidate the mechanisms and principles driving the processes because of the nonlinearity and complexity. The models usually incorporate a set of parameters that signify the physical properties of the actual biological systems. In most cases, these parameters are estimated by fitting the model outputs with the corresponding experimental data. However, this is a challenging task because the available experimental data are frequently noisy and incomplete. In this paper, a new hybrid optimization method is proposed to estimate these parameters from the noisy and incomplete experimental data. The proposed method, called Swarm-based Chemical Reaction Optimization, integrates the evolutionary searching strategy employed by the Chemical Reaction Optimization, into the neighbouring searching strategy of the Firefly Algorithm method. The effectiveness of the method was evaluated using a simulated nonlinear model and two biological models: synthetic transcriptional oscillators, and extracellular protease production models. The results showed that the accuracy and computational speed of the proposed method were better than the existing Differential Evolution, Firefly Algorithm and Chemical Reaction Optimization methods. The reliability of the estimated parameters was statistically validated, which suggests that the model outputs produced by these parameters were valid even when noisy and incomplete experimental data were used. Additionally, Akaike Information Criterion was employed to evaluate the model selection, which highlighted the capability of the proposed method in choosing a plausible model based on the experimental data. In conclusion, this paper presents the effectiveness of the proposed method for parameter estimation and model selection problems using noisy and incomplete experimental data. This study is hoped to provide a new insight in developing more accurate and reliable biological models based on limited and low quality experimental data.
Collapse
Affiliation(s)
- Afnizanfaizal Abdullah
- Artificial Intelligence and Bioinformatics Group (AIBIG), Faculty of Computing, Universiti Teknologi Malaysia, UTM, Johor, Malaysia.
| | | | | | | |
Collapse
|
42
|
Chakrabarty A, Buzzard GT, Rundell AE. Model-based design of experiments for cellular processes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2013; 5:181-203. [PMID: 23293047 DOI: 10.1002/wsbm.1204] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Ankush Chakrabarty
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
| | | | | |
Collapse
|
43
|
Flassig RJ, Sundmacher K. Optimal design of stimulus experiments for robust discrimination of biochemical reaction networks. Bioinformatics 2012; 28:3089-96. [PMID: 23047554 PMCID: PMC3516143 DOI: 10.1093/bioinformatics/bts585] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Biochemical reaction networks in the form of coupled ordinary differential equations (ODEs) provide a powerful modeling tool for understanding the dynamics of biochemical processes. During the early phase of modeling, scientists have to deal with a large pool of competing nonlinear models. At this point, discrimination experiments can be designed and conducted to obtain optimal data for selecting the most plausible model. Since biological ODE models have widely distributed parameters due to, e.g. biologic variability or experimental variations, model responses become distributed. Therefore, a robust optimal experimental design (OED) for model discrimination can be used to discriminate models based on their response probability distribution functions (PDFs). RESULTS In this work, we present an optimal control-based methodology for designing optimal stimulus experiments aimed at robust model discrimination. For estimating the time-varying model response PDF, which results from the nonlinear propagation of the parameter PDF under the ODE dynamics, we suggest using the sigma-point approach. Using the model overlap (expected likelihood) as a robust discrimination criterion to measure dissimilarities between expected model response PDFs, we benchmark the proposed nonlinear design approach against linearization with respect to prediction accuracy and design quality for two nonlinear biological reaction networks. As shown, the sigma-point outperforms the linearization approach in the case of widely distributed parameter sets and/or existing multiple steady states. Since the sigma-point approach scales linearly with the number of model parameter, it can be applied to large systems for robust experimental planning. AVAILABILITY An implementation of the method in MATLAB/AMPL is available at http://www.uni-magdeburg.de/ivt/svt/person/rf/roed.html. CONTACT flassig@mpi-magdeburg.mpg.de SUPPLEMENTARY INFORMATION Supplementary data are are available at Bioinformatics online.
Collapse
Affiliation(s)
- R J Flassig
- Otto-von-Guericke University, Process Systems Engineering, Universitätsplatz 2, D-39106 Magdeburg, Germany.
| | | |
Collapse
|
44
|
Efficient reverse-engineering of a developmental gene regulatory network. PLoS Comput Biol 2012; 8:e1002589. [PMID: 22807664 PMCID: PMC3395622 DOI: 10.1371/journal.pcbi.1002589] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 04/27/2012] [Indexed: 11/19/2022] Open
Abstract
Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to discover whether there are rules or regularities governing development and evolution of complex multi-cellular organisms.
Collapse
|
45
|
Marvel SW, Williams CM. Set membership experimental design for biological systems. BMC SYSTEMS BIOLOGY 2012; 6:21. [PMID: 22436240 PMCID: PMC3393616 DOI: 10.1186/1752-0509-6-21] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 03/21/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND Experimental design approaches for biological systems are needed to help conserve the limited resources that are allocated for performing experiments. The assumptions used when assigning probability density functions to characterize uncertainty in biological systems are unwarranted when only a small number of measurements can be obtained. In these situations, the uncertainty in biological systems is more appropriately characterized in a bounded-error context. Additionally, effort must be made to improve the connection between modelers and experimentalists by relating design metrics to biologically relevant information. Bounded-error experimental design approaches that can assess the impact of additional measurements on model uncertainty are needed to identify the most appropriate balance between the collection of data and the availability of resources. RESULTS In this work we develop a bounded-error experimental design framework for nonlinear continuous-time systems when few data measurements are available. This approach leverages many of the recent advances in bounded-error parameter and state estimation methods that use interval analysis to generate parameter sets and state bounds consistent with uncertain data measurements. We devise a novel approach using set-based uncertainty propagation to estimate measurement ranges at candidate time points. We then use these estimated measurements at the candidate time points to evaluate which candidate measurements furthest reduce model uncertainty. A method for quickly combining multiple candidate time points is presented and allows for determining the effect of adding multiple measurements. Biologically relevant metrics are developed and used to predict when new data measurements should be acquired, which system components should be measured and how many additional measurements should be obtained. CONCLUSIONS The practicability of our approach is illustrated with a case study. This study shows that our approach is able to 1) identify candidate measurement time points that maximize information corresponding to biologically relevant metrics and 2) determine the number at which additional measurements begin to provide insignificant information. This framework can be used to balance the availability of resources with the addition of one or more measurement time points to improve the predictability of resulting models.
Collapse
Affiliation(s)
- Skylar W Marvel
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695, USA
| | | |
Collapse
|
46
|
Tam JS, Barbeschi M, Shapovalova N, Briand S, Memish ZA, Kieny MP. Research agenda for mass gatherings: a call to action. THE LANCET. INFECTIOUS DISEASES 2012; 12:231-9. [PMID: 22252148 PMCID: PMC7106416 DOI: 10.1016/s1473-3099(11)70353-x] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Public health research is essential for the development of effective policies and planning to address health security and risks associated with mass gatherings (MGs). Crucial research topics related to MGs and their effects on global health security are discussed in this review. The research agenda for MGs consists of a framework of five major public health research directions that address issues related to reducing the risk of public health emergencies during MGs; restricting the occurrence of non-communicable and communicable diseases; minimisation of the effect of public health events associated with MGs; optimisation of the medical services and treatment of diseases during MGs; and development and application of modern public health measures. Implementation of the proposed research topics would be expected to provide benefits over the medium to long term in planning for MGs.
Collapse
Affiliation(s)
- John S Tam
- Initiative for Vaccine Research, Family, Women's and Children's Health Cluster, WHO, Geneva, Switzerland.
| | | | | | | | | | | |
Collapse
|
47
|
Sun J, Garibaldi JM, Hodgman C. Parameter estimation using meta-heuristics in systems biology: a comprehensive review. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:185-202. [PMID: 21464505 DOI: 10.1109/tcbb.2011.63] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
This paper gives a comprehensive review of the application of meta-heuristics to optimization problems in systems biology, mainly focussing on the parameter estimation problem (also called the inverse problem or model calibration). It is intended for either the system biologist who wishes to learn more about the various optimization techniques available and/or the meta-heuristic optimizer who is interested in applying such techniques to problems in systems biology. First, the parameter estimation problems emerging from different areas of systems biology are described from the point of view of machine learning. Brief descriptions of various meta-heuristics developed for these problems follow, along with outlines of their advantages and disadvantages. Several important issues in applying meta-heuristics to the systems biology modelling problem are addressed, including the reliability and identifiability of model parameters, optimal design of experiments, and so on. Finally, we highlight some possible future research directions in this field.
Collapse
|
48
|
Balsa-Canto E, Banga JR, Egea JA, Fernandez-Villaverde A, de Hijas-Liste GM. Global optimization in systems biology: stochastic methods and their applications. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 736:409-24. [PMID: 22161343 DOI: 10.1007/978-1-4419-7210-1_24] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Mathematical optimization is at the core of many problems in systems biology: (1) as the underlying hypothesis for model development, (2) in model identification, or (3) in the computation of optimal stimulation procedures to synthetically achieve a desired biological behavior. These problems are usually formulated as nonlinear programing problems (NLPs) with dynamic and algebraic constraints. However the nonlinear and highly constrained nature of systems biology models, together with the usually large number of decision variables, can make their solution a daunting task, therefore calling for efficient and robust optimization techniques. Here, we present novel global optimization methods and software tools such as cooperative enhanced scatter search (eSS), AMIGO, or DOTcvpSB, and illustrate their possibilities in the context of modeling including model identification and stimulation design in systems biology.
Collapse
Affiliation(s)
- Eva Balsa-Canto
- (Bio)Process Engineering Group, IIM-CSIC, C/Eduardo Cabello 6, 36208 Vigo, Spain.
| | | | | | | | | |
Collapse
|
49
|
Lai X, Wolkenhauer O, Vera J. Modeling miRNA regulation in cancer signaling systems: miR-34a regulation of the p53/Sirt1 signaling module. Methods Mol Biol 2012; 880:87-108. [PMID: 23361983 DOI: 10.1007/978-1-61779-833-7_6] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
MicroRNAs (miRNAs) are a family of small regulatory RNAs whose function is to regulate the activity and stability of specific messenger RNA targets through posttranscriptional regulatory mechanisms. Most of the times signaling systems involving miRNA modulation are not linear pathways in which a certain transcription factor activate the expression of miRNAs that posttranscriptionally represses targeting proteins, but complex regulatory structures involving a variety of feedback-loop architectures.In this book chapter, we define, discuss, and apply a Systems Biology approach to investigate dynamical features of miRNA regulation, based on the integration of experimental evidences, hypotheses, and quantitative data through mathematical modeling. We further illustrate the approach using as case study the signaling module composed by the proteins p53, Sirt1, and the regulatory miRNA miR-34a. The model was used not only to investigate different possible designs of the silencing mechanism exerted by miR-34a on Sirt1 but also to simulate the dynamics of the system under conditions of (pathological) deregulation of its compounds.
Collapse
Affiliation(s)
- Xin Lai
- Systems Biology and Bioinformatics Group, Department of Computer Science, University of Rostock, Rostock, Germany.
| | | | | |
Collapse
|
50
|
Chis OT, Banga JR, Balsa-Canto E. Structural identifiability of systems biology models: a critical comparison of methods. PLoS One 2011; 6:e27755. [PMID: 22132135 PMCID: PMC3222653 DOI: 10.1371/journal.pone.0027755] [Citation(s) in RCA: 207] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2011] [Accepted: 10/24/2011] [Indexed: 12/15/2022] Open
Abstract
Analysing the properties of a biological system through in silico experimentation requires a satisfactory mathematical representation of the system including accurate values of the model parameters. Fortunately, modern experimental techniques allow obtaining time-series data of appropriate quality which may then be used to estimate unknown parameters. However, in many cases, a subset of those parameters may not be uniquely estimated, independently of the experimental data available or the numerical techniques used for estimation. This lack of identifiability is related to the structure of the model, i.e. the system dynamics plus the observation function. Despite the interest in knowing a priori whether there is any chance of uniquely estimating all model unknown parameters, the structural identifiability analysis for general non-linear dynamic models is still an open question. There is no method amenable to every model, thus at some point we have to face the selection of one of the possibilities. This work presents a critical comparison of the currently available techniques. To this end, we perform the structural identifiability analysis of a collection of biological models. The results reveal that the generating series approach, in combination with identifiability tableaus, offers the most advantageous compromise among range of applicability, computational complexity and information provided.
Collapse
Affiliation(s)
| | | | - Eva Balsa-Canto
- Bioprocess Engineering Group, IIM-CSIC, Vigo, Spain
- * E-mail:
| |
Collapse
|