1
|
Corrigendum to "Controllability and accessibility analysis of nonlinear biosystems" [Computer Methods and Programs in Biomedicine 242 (2023) 107837]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108015. [PMID: 38219338 DOI: 10.1016/j.cmpb.2024.108015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
|
2
|
Controllability and accessibility analysis of nonlinear biosystems. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 242:107837. [PMID: 37837888 DOI: 10.1016/j.cmpb.2023.107837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/28/2023] [Accepted: 09/30/2023] [Indexed: 10/16/2023]
Abstract
BACKGROUND We address the problem of determining the controllability and accessibility of nonlinear biosystems. We consider models described by affine-in-inputs ordinary differential equations, which are adequate for a wide array of biological processes. Roughly speaking, the controllability of a dynamical system determines the possibility of steering it from an initial state to any point in its neighbourhood; accessibility is a weaker form of controllability. METHODS While the methodology for analysing the controllability of linear systems is well established, its generalization to the nonlinear case has proven elusive. Thus, a number of related but different properties - including different versions of accessibility, reachability or weak local controllability - have been defined to approach its study, and several partial results exist in lieu of a general test. Here, leveraging the applicable results from differential geometric control theory, we source sufficient conditions to assess nonlinear controllability, as well as a necessary and sufficient condition for accessibility. RESULTS We develop an algorithmic procedure to evaluate these conditions efficiently, and we provide its open source implementation. Using this software tool, we analyse the accessibility and controllability of a number of models of biomedical interest. While some of them are fully controllable, we find others that are not, as is the case of some models of EGF and NFκB signalling networks. CONCLUSIONS The contributions in this paper facilitate the accessibility and controllability analysis of nonlinear models, not only in biomedicine but also in other areas in which they have been rarely performed to date.
Collapse
|
3
|
Distilling identifiable and interpretable dynamic models from biological data. PLoS Comput Biol 2023; 19:e1011014. [PMID: 37851682 PMCID: PMC10615316 DOI: 10.1371/journal.pcbi.1011014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 10/30/2023] [Accepted: 10/03/2023] [Indexed: 10/20/2023] Open
Abstract
Mechanistic dynamical models allow us to study the behavior of complex biological systems. They can provide an objective and quantitative understanding that would be difficult to achieve through other means. However, the systematic development of these models is a non-trivial exercise and an open problem in computational biology. Currently, many research efforts are focused on model discovery, i.e. automating the development of interpretable models from data. One of the main frameworks is sparse regression, where the sparse identification of nonlinear dynamics (SINDy) algorithm and its variants have enjoyed great success. SINDy-PI is an extension which allows the discovery of rational nonlinear terms, thus enabling the identification of kinetic functions common in biochemical networks, such as Michaelis-Menten. SINDy-PI also pays special attention to the recovery of parsimonious models (Occam's razor). Here we focus on biological models composed of sets of deterministic nonlinear ordinary differential equations. We present a methodology that, combined with SINDy-PI, allows the automatic discovery of structurally identifiable and observable models which are also mechanistically interpretable. The lack of structural identifiability and observability makes it impossible to uniquely infer parameter and state variables, which can compromise the usefulness of a model by distorting its mechanistic significance and hampering its ability to produce biological insights. We illustrate the performance of our method with six case studies. We find that, despite enforcing sparsity, SINDy-PI sometimes yields models that are unidentifiable. In these cases we show how our method transforms their equations in order to obtain a structurally identifiable and observable model which is also interpretable.
Collapse
|
4
|
Assessment of Prediction Uncertainty Quantification Methods in Systems Biology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1725-1736. [PMID: 36223355 DOI: 10.1109/tcbb.2022.3213914] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Biological processes are often modelled using ordinary differential equations. The unknown parameters of these models are estimated by optimizing the fit of model simulation and experimental data. The resulting parameter estimates inevitably possess some degree of uncertainty. In practical applications it is important to quantify these parameter uncertainties as well as the resulting prediction uncertainty, which are uncertainties of potentially time-dependent model characteristics. Unfortunately, estimating prediction uncertainties accurately is nontrivial, due to the nonlinear dependence of model characteristics on parameters. While a number of numerical approaches have been proposed for this task, their strengths and weaknesses have not been systematically assessed yet. To fill this knowledge gap, we apply four state of the art methods for uncertainty quantification to four case studies of different computational complexities. This reveals the trade-offs between their applicability and their statistical interpretability. Our results provide guidelines for choosing the most appropriate technique for a given problem and applying it successfully.
Collapse
|
5
|
Structural Identifiability and Observability of Microbial Community Models. Bioengineering (Basel) 2023; 10:bioengineering10040483. [PMID: 37106670 PMCID: PMC10135947 DOI: 10.3390/bioengineering10040483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 04/10/2023] [Accepted: 04/14/2023] [Indexed: 04/29/2023] Open
Abstract
Biological communities are populations of various species interacting in a common location. Microbial communities, which are formed by microorganisms, are ubiquitous in nature and are increasingly used in biotechnological and biomedical applications. They are nonlinear systems whose dynamics can be accurately described by models of ordinary differential equations (ODEs). A number of ODE models have been proposed to describe microbial communities. However, the structural identifiability and observability of most of them-that is, the theoretical possibility of inferring their parameters and internal states by observing their output-have not been determined yet. It is important to establish whether a model possesses these properties, because, in their absence, the ability of a model to make reliable predictions may be compromised. Hence, in this paper, we analyse these properties for the main families of microbial community models. We consider several dimensions and measurements; overall, we analyse more than a hundred different configurations. We find that some of them are fully identifiable and observable, but a number of cases are structurally unidentifiable and/or unobservable under typical experimental conditions. Our results help in deciding which modelling frameworks may be used for a given purpose in this emerging area, and which ones should be avoided.
Collapse
|
6
|
Benchmarking tools for a priori identifiability analysis. Bioinformatics 2023; 39:7017524. [PMID: 36721336 PMCID: PMC9913045 DOI: 10.1093/bioinformatics/btad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 01/17/2023] [Accepted: 01/30/2023] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION The theoretical possibility of determining the state and parameters of a dynamic model by measuring its outputs is given by its structural identifiability and its observability. These properties should be analysed before attempting to calibrate a model, but their a priori analysis can be challenging, requiring symbolic calculations that often have a high computational cost. In recent years, a number of software tools have been developed for this task, mostly in the systems biology community. These tools have vastly different features and capabilities, and a critical assessment of their performance is still lacking. RESULTS Here, we present a comprehensive study of the computational resources available for analysing structural identifiability. We consider 13 software tools developed in 7 programming languages and evaluate their performance using a set of 25 case studies created from 21 models. Our results reveal their strengths and weaknesses, provide guidelines for choosing the most appropriate tool for a given problem and highlight opportunities for future developments. AVAILABILITY AND IMPLEMENTATION https://github.com/Xabo-RB/Benchmarking_files.
Collapse
|
7
|
STRIKE-GOLDD 4.0: user-friendly, efficient analysis of structural identifiability and observability. Bioinformatics 2023; 39:6833126. [PMID: 36398887 PMCID: PMC9805590 DOI: 10.1093/bioinformatics/btac748] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 11/15/2022] [Accepted: 11/17/2022] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION STRIKE-GOLDD is a toolbox that analyses the structural identifiability and observability of possibly non-linear, non-rational ODE models that may have known and unknown inputs. Its broad applicability comes at the expense of a lower computational efficiency than other tools. RESULTS STRIKE-GOLDD 4.0 includes a new algorithm, ProbObsTest, specifically designed for the analysis of rational models. ProbObsTest is significantly faster than the previously available FISPO algorithm when applied to computationally expensive models. Providing both algorithms in the same toolbox allows combining generality and computational efficiency. STRIKE-GOLDD 4.0 is implemented as a Matlab toolbox with a user-friendly graphical interface. AVAILABILITY AND IMPLEMENTATION STRIKE-GOLDD 4.0 is a free and open-source tool available under a GPLv3 license. It can be downloaded from GitHub at https://github.com/afvillaverde/strike-goldd. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
8
|
Improving dynamic predictions with ensembles of observable models. Bioinformatics 2022; 39:6842325. [PMID: 36416122 PMCID: PMC9805594 DOI: 10.1093/bioinformatics/btac755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/20/2022] [Accepted: 11/22/2022] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Dynamic mechanistic modelling in systems biology has been hampered by the complexity and variability associated with the underlying interactions, and by uncertain and sparse experimental measurements. Ensemble modelling, a concept initially developed in statistical mechanics, has been introduced in biological applications with the aim of mitigating those issues. Ensemble modelling uses a collection of different models compatible with the observed data to describe the phenomena of interest. However, since systems biology models often suffer from a lack of identifiability and observability, ensembles of models are particularly unreliable when predicting non-observable states. RESULTS We present a strategy to assess and improve the reliability of a class of model ensembles. In particular, we consider kinetic models described using ordinary differential equations with a fixed structure. Our approach builds an ensemble with a selection of the parameter vectors found when performing parameter estimation with a global optimization metaheuristic. This technique enforces diversity during the sampling of parameter space and it can quantify the uncertainty in the predictions of state trajectories. We couple this strategy with structural identifiability and observability analysis, and when these tests detect possible prediction issues we obtain model reparameterizations that surmount them. The end result is an ensemble of models with the ability to predict the internal dynamics of a biological process. We demonstrate our approach with models of glucose regulation, cell division, circadian oscillations and the JAK-STAT signalling pathway. AVAILABILITY AND IMPLEMENTATION The code that implements the methodology and reproduces the results is available at https://doi.org/10.5281/zenodo.6782638. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
9
|
A protocol for dynamic model calibration. Brief Bioinform 2022; 23:bbab387. [PMID: 34619769 PMCID: PMC8769694 DOI: 10.1093/bib/bbab387] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 08/06/2021] [Accepted: 08/29/2021] [Indexed: 12/23/2022] Open
Abstract
Ordinary differential equation models are nowadays widely used for the mechanistic description of biological processes and their temporal evolution. These models typically have many unknown and nonmeasurable parameters, which have to be determined by fitting the model to experimental data. In order to perform this task, known as parameter estimation or model calibration, the modeller faces challenges such as poor parameter identifiability, lack of sufficiently informative experimental data and the existence of local minima in the objective function landscape. These issues tend to worsen with larger model sizes, increasing the computational complexity and the number of unknown parameters. An incorrectly calibrated model is problematic because it may result in inaccurate predictions and misleading conclusions. For nonexpert users, there are a large number of potential pitfalls. Here, we provide a protocol that guides the user through all the steps involved in the calibration of dynamic models. We illustrate the methodology with two models and provide all the code required to reproduce the results and perform the same analysis on new models. Our protocol provides practitioners and researchers in biological modelling with a one-stop guide that is at the same time compact and sufficiently comprehensive to cover all aspects of the problem.
Collapse
|
10
|
On testing structural identifiability by a simple scaling method: Relying on scaling symmetries can be misleading. PLoS Comput Biol 2021; 17:e1009032. [PMID: 34648496 PMCID: PMC8516234 DOI: 10.1371/journal.pcbi.1009032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 05/03/2021] [Indexed: 11/19/2022] Open
Abstract
A recent paper published in PLOS Computational Biology [1] introduces the Scaling Invariance Method (SIM) for analysing structural local identifiability and observability. These two properties define mathematically the possibility of determining the values of the parameters (identifiability) and states (observability) of a dynamic model by observing its output. In this note we warn that SIM considers scaling symmetries as the only possible cause of non-identifiability and non-observability. We show that other types of symmetries can cause the same problems without being detected by SIM, and that in those cases the method may lead one to conclude that the model is identifiable and observable when it is actually not.
Collapse
|
11
|
PEtab-Interoperable specification of parameter estimation problems in systems biology. PLoS Comput Biol 2021; 17:e1008646. [PMID: 33497393 PMCID: PMC7864467 DOI: 10.1371/journal.pcbi.1008646] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 02/05/2021] [Accepted: 12/18/2020] [Indexed: 01/24/2023] Open
Abstract
Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been-so far-no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies.
Collapse
|
12
|
Structural identifiability and observability of compartmental models of the COVID-19 pandemic. ANNUAL REVIEWS IN CONTROL 2021; 51:441-459. [PMID: 33362427 PMCID: PMC7752088 DOI: 10.1016/j.arcontrol.2020.12.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 09/24/2020] [Accepted: 12/01/2020] [Indexed: 05/18/2023]
Abstract
The recent coronavirus disease (COVID-19) outbreak has dramatically increased the public awareness and appreciation of the utility of dynamic models. At the same time, the dissemination of contradictory model predictions has highlighted their limitations. If some parameters and/or state variables of a model cannot be determined from output measurements, its ability to yield correct insights - as well as the possibility of controlling the system - may be compromised. Epidemic dynamics are commonly analysed using compartmental models, and many variations of such models have been used for analysing and predicting the evolution of the COVID-19 pandemic. In this paper we survey the different models proposed in the literature, assembling a list of 36 model structures and assessing their ability to provide reliable information. We address the problem using the control theoretic concepts of structural identifiability and observability. Since some parameters can vary during the course of an epidemic, we consider both the constant and time-varying parameter assumptions. We analyse the structural identifiability and observability of all of the models, considering all plausible choices of outputs and time-varying parameters, which leads us to analyse 255 different model versions. We classify the models according to their structural identifiability and observability under the different assumptions and discuss the implications of the results. We also illustrate with an example several alternative ways of remedying the lack of observability of a model. Our analyses provide guidelines for choosing the most informative model for each purpose, taking into account the available knowledge and measurements.
Collapse
|
13
|
Benchmarking optimization methods for parameter estimation in large kinetic models. Bioinformatics 2019; 35:830-838. [PMID: 30816929 PMCID: PMC6394396 DOI: 10.1093/bioinformatics/bty736] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 07/04/2018] [Accepted: 08/21/2018] [Indexed: 11/18/2022] Open
Abstract
Motivation Kinetic models contain unknown parameters that are estimated by optimizing the fit to experimental data. This task can be computationally challenging due to the presence of local optima and ill-conditioning. While a variety of optimization methods have been suggested to surmount these issues, it is difficult to choose the best one for a given problem a priori. A systematic comparison of parameter estimation methods for problems with tens to hundreds of optimization variables is currently missing, and smaller studies provided contradictory findings. Results We use a collection of benchmarks to evaluate the performance of two families of optimization methods: (i) multi-starts of deterministic local searches and (ii) stochastic global optimization metaheuristics; the latter may be combined with deterministic local searches, leading to hybrid methods. A fair comparison is ensured through a collaborative evaluation and a consideration of multiple performance metrics. We discuss possible evaluation criteria to assess the trade-off between computational efficiency and robustness. Our results show that, thanks to recent advances in the calculation of parametric sensitivities, a multi-start of gradient-based local methods is often a successful strategy, but a better performance can be obtained with a hybrid metaheuristic. The best performer combines a global scatter search metaheuristic with an interior point local method, provided with gradients estimated with adjoint-based sensitivities. We provide an implementation of this method to render it available to the scientific community. Availability and implementation The code to reproduce the results is provided as Supplementary Material and is available at Zenodo https://doi.org/10.5281/zenodo.1304034. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
14
|
Full observability and estimation of unknown inputs, states and parameters of nonlinear biological models. J R Soc Interface 2019; 16:20190043. [PMID: 31266417 PMCID: PMC6685009 DOI: 10.1098/rsif.2019.0043] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
In this paper, we address the system identification problem in the context of biological modelling. We present and demonstrate a methodology for (i) assessing the possibility of inferring the unknown quantities in a dynamic model and (ii) effectively estimating them from output data. We introduce the term Full Input-State-Parameter Observability (FISPO) analysis to refer to the simultaneous assessment of state, input and parameter observability (note that parameter observability is also known as identifiability). This type of analysis has often remained elusive in the presence of unmeasured inputs. The method proposed in this paper can be applied to a general class of nonlinear ordinary differential equations models. We apply this approach to three models from the recent literature. First, we determine whether it is theoretically possible to infer the states, parameters and inputs, taking only the model equations into account. When this analysis detects deficiencies, we reformulate the model to make it fully observable. Then we move to numerical scenarios and apply an optimization-based technique to estimate the states, parameters and inputs. The results demonstrate the feasibility of an integrated strategy for (i) analysing the theoretical possibility of determining the states, parameters and inputs to a system and (ii) solving the practical problem of actually estimating their values.
Collapse
|
15
|
PREMER: A Tool to Infer Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1193-1202. [PMID: 28981423 DOI: 10.1109/tcbb.2017.2758786] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features-such as distinguishing between direct and indirect interactions or determining the direction of a causal link-requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end, we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux, and OSX (https://sites.google.com/site/premertoolbox/).
Collapse
|
16
|
Dynamical compensation and structural identifiability of biological models: Analysis, implications, and reconciliation. PLoS Comput Biol 2017; 13:e1005878. [PMID: 29186132 PMCID: PMC5724898 DOI: 10.1371/journal.pcbi.1005878] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 12/11/2017] [Accepted: 11/13/2017] [Indexed: 01/15/2023] Open
Abstract
The concept of dynamical compensation has been recently introduced to describe the ability of a biological system to keep its output dynamics unchanged in the face of varying parameters. However, the original definition of dynamical compensation amounts to lack of structural identifiability. This is relevant if model parameters need to be estimated, as is often the case in biological modelling. Care should we taken when using an unidentifiable model to extract biological insight: the estimated values of structurally unidentifiable parameters are meaningless, and model predictions about unmeasured state variables can be wrong. Taking this into account, we explore alternative definitions of dynamical compensation that do not necessarily imply structural unidentifiability. Accordingly, we show different ways in which a model can be made identifiable while exhibiting dynamical compensation. Our analyses enable the use of the new concept of dynamical compensation in the context of parameter identification, and reconcile it with the desirable property of structural identifiability. A robust behaviour is a desirable feature in many biological systems. The study of mechanisms capable of maintaining the transient response unchanged despite environmental disturbances has recently motivated the introduction of a new concept: Dynamical Compensation (DC). However, the original definition of DC with respect to a parameter amounts to structural unidentifiability of that parameter, which means that it cannot be estimated by measuring the model output. Since most biological models have unknown parameters that need to be estimated, DC can be considered a negative property for the purpose of model identification. In this paper we reconcile these two conflicting views by proposing a new definition of DC that captures its intended biological meaning (i.e. robustness, which should be a systemic property, intrinsic to the dynamics) while making it distinct from structural unidentifiability (which is a modelling property that depends on decisions made by the modeller, such as the choice of model outputs or unknown parameters, and on experimental constraints). Our definition enables a model to have DC with respect to a structurally identifiable parameter, thus increasing the applicability of the concept.
Collapse
|
17
|
Parameter identifiability analysis and visualization in large-scale kinetic models of biosystems. BMC SYSTEMS BIOLOGY 2017; 11:54. [PMID: 28476119 PMCID: PMC5420165 DOI: 10.1186/s12918-017-0428-y] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2017] [Accepted: 04/25/2017] [Indexed: 01/13/2023]
Abstract
Background Kinetic models of biochemical systems usually consist of ordinary differential equations that have many unknown parameters. Some of these parameters are often practically unidentifiable, that is, their values cannot be uniquely determined from the available data. Possible causes are lack of influence on the measured outputs, interdependence among parameters, and poor data quality. Uncorrelated parameters can be seen as the key tuning knobs of a predictive model. Therefore, before attempting to perform parameter estimation (model calibration) it is important to characterize the subset(s) of identifiable parameters and their interplay. Once this is achieved, it is still necessary to perform parameter estimation, which poses additional challenges. Methods We present a methodology that (i) detects high-order relationships among parameters, and (ii) visualizes the results to facilitate further analysis. We use a collinearity index to quantify the correlation between parameters in a group in a computationally efficient way. Then we apply integer optimization to find the largest groups of uncorrelated parameters. We also use the collinearity index to identify small groups of highly correlated parameters. The results files can be visualized using Cytoscape, showing the identifiable and non-identifiable groups of parameters together with the model structure in the same graph. Results Our contributions alleviate the difficulties that appear at different stages of the identifiability analysis and parameter estimation process. We show how to combine global optimization and regularization techniques for calibrating medium and large scale biological models with moderate computation times. Then we evaluate the practical identifiability of the estimated parameters using the proposed methodology. The identifiability analysis techniques are implemented as a MATLAB toolbox called VisId, which is freely available as open source from GitHub (https://github.com/gabora/visid). Conclusions Our approach is geared towards scalability. It enables the practical identifiability analysis of dynamic models of large size, and accelerates their calibration. The visualization tool allows modellers to detect parts that are problematic and need refinement or reformulation, and provides experimentalists with information that can be helpful in the design of new experiments. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0428-y) contains supplementary material, which is available to authorized users.
Collapse
|
18
|
Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput Biol 2017; 13:e1005379. [PMID: 28166222 PMCID: PMC5319798 DOI: 10.1371/journal.pcbi.1005379] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 02/21/2017] [Accepted: 01/24/2017] [Indexed: 11/19/2022] Open
Abstract
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM's ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Collapse
|
19
|
Abstract
A powerful way of gaining insight into biological systems is by creating a nonlinear differential equation model, which usually contains many unknown parameters. Such a model is called structurally identifiable if it is possible to determine the values of its parameters from measurements of the model outputs. Structural identifiability is a prerequisite for parameter estimation, and should be assessed before exploiting a model. However, this analysis is seldom performed due to the high computational cost involved in the necessary symbolic calculations, which quickly becomes prohibitive as the problem size increases. In this paper we show how to analyse the structural identifiability of a very general class of nonlinear models by extending methods originally developed for studying observability. We present results about models whose identifiability had not been previously determined, report unidentifiabilities that had not been found before, and show how to modify those unidentifiable models to make them identifiable. This method helps prevent problems caused by lack of identifiability analysis, which can compromise the success of tasks such as experiment design, parameter estimation, and model-based optimization. The procedure is called STRIKE-GOLDD (STRuctural Identifiability taKen as Extended-Generalized Observability with Lie Derivatives and Decomposition), and it is implemented in a MATLAB toolbox which is available as open source software. The broad applicability of this approach facilitates the analysis of the increasingly complex models used in systems biology and other areas.
Collapse
|
20
|
Metabolic engineering with multi-objective optimization of kinetic models. J Biotechnol 2016; 222:1-8. [PMID: 26826510 DOI: 10.1016/j.jbiotec.2016.01.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Revised: 12/30/2015] [Accepted: 01/11/2016] [Indexed: 10/22/2022]
Abstract
Kinetic models have a great potential for metabolic engineering applications. They can be used for testing which genetic and regulatory modifications can increase the production of metabolites of interest, while simultaneously monitoring other key functions of the host organism. This work presents a methodology for increasing productivity in biotechnological processes exploiting dynamic models. It uses multi-objective dynamic optimization to identify the combination of targets (enzymatic modifications) and the degree of up- or down-regulation that must be performed in order to optimize a set of pre-defined performance metrics subject to process constraints. The capabilities of the approach are demonstrated on a realistic and computationally challenging application: a large-scale metabolic model of Chinese Hamster Ovary cells (CHO), which are used for antibody production in a fed-batch process. The proposed methodology manages to provide a sustained and robust growth in CHO cells, increasing productivity while simultaneously increasing biomass production, product titer, and keeping the concentrations of lactate and ammonia at low values. The approach presented here can be used for optimizing metabolic models by finding the best combination of targets and their optimal level of up/down-regulation. Furthermore, it can accommodate additional trade-offs and constraints with great flexibility.
Collapse
|
21
|
PREMER: Parallel Reverse Engineering of Biological Networks with Information Theory. COMPUTATIONAL METHODS IN SYSTEMS BIOLOGY 2016. [DOI: 10.1007/978-3-319-45177-0_21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
22
|
Enabling network inference methods to handle missing data and outliers. BMC Bioinformatics 2015; 16:283. [PMID: 26335628 PMCID: PMC4559359 DOI: 10.1186/s12859-015-0717-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 08/24/2015] [Indexed: 12/20/2022] Open
Abstract
Background The inference of complex networks from data is a challenging problem in biological sciences, as well as in a wide range of disciplines such as chemistry, technology, economics, or sociology. The quantity and quality of the data greatly affect the results. While many methodologies have been developed for this task, they seldom take into account issues such as missing data or outlier detection and correction, which need to be properly addressed before network inference. Results Here we present an approach to (i) handle missing data and (ii) detect and correct outliers based on multivariate projection to latent structures. The method, called trimmed scores regression (TSR), enables network inference methods to analyse incomplete datasets by imputing the missing values coherently with the latent data structure. Furthermore, it substitutes the faulty values in a dataset by proper estimations. We provide an implementation of this approach, and show how it can be integrated with any network inference method as a preliminary data curation step. This functionality is demonstrated with a state of the art network inference method based on mutual information distance and entropy reduction, MIDER. Conclusion The methodology presented here enables network inference methods to analyse a large number of incomplete and faulty datasets that could not be reliably analysed so far. Our comparative studies show the superiority of TSR over other missing data approaches used by practitioners. Furthermore, the method allows for outlier detection and correction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0717-7) contains supplementary material, which is available to authorized users.
Collapse
|
23
|
A consensus approach for estimating the predictive accuracy of dynamic models in biology. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:17-28. [PMID: 25716416 DOI: 10.1016/j.cmpb.2015.02.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 12/19/2014] [Accepted: 02/02/2015] [Indexed: 06/04/2023]
Abstract
Mathematical models that predict the complex dynamic behaviour of cellular networks are fundamental in systems biology, and provide an important basis for biomedical and biotechnological applications. However, obtaining reliable predictions from large-scale dynamic models is commonly a challenging task due to lack of identifiability. The present work addresses this challenge by presenting a methodology for obtaining high-confidence predictions from dynamic models using time-series data. First, to preserve the complex behaviour of the network while reducing the number of estimated parameters, model parameters are combined in sets of meta-parameters, which are obtained from correlations between biochemical reaction rates and between concentrations of the chemical species. Next, an ensemble of models with different parameterizations is constructed and calibrated. Finally, the ensemble is used for assessing the reliability of model predictions by defining a measure of convergence of model outputs (consensus) that is used as an indicator of confidence. We report results of computational tests carried out on a metabolic model of Chinese Hamster Ovary (CHO) cells, which are used for recombinant protein production. Using noisy simulated data, we find that the aggregated ensemble predictions are on average more accurate than the predictions of individual ensemble models. Furthermore, ensemble predictions with high consensus are statistically more accurate than ensemble predictions with large variance. The procedure provides quantitative estimates of the confidence in model predictions and enables the analysis of sufficiently complex networks as required for practical applications.
Collapse
|
24
|
MEIGO: an open-source software suite based on metaheuristics for global optimization in systems biology and bioinformatics. BMC Bioinformatics 2014; 15:136. [PMID: 24885957 PMCID: PMC4025564 DOI: 10.1186/1471-2105-15-136] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Accepted: 04/24/2014] [Indexed: 11/28/2022] Open
Abstract
Background Optimization is the key to solving many problems in computational biology. Global optimization methods, which provide a robust methodology, and metaheuristics in particular have proven to be the most efficient methods for many applications. Despite their utility, there is a limited availability of metaheuristic tools. Results We present MEIGO, an R and Matlab optimization toolbox (also available in Python via a wrapper of the R version), that implements metaheuristics capable of solving diverse problems arising in systems biology and bioinformatics. The toolbox includes the enhanced scatter search method (eSS) for continuous nonlinear programming (cNLP) and mixed-integer programming (MINLP) problems, and variable neighborhood search (VNS) for Integer Programming (IP) problems. Additionally, the R version includes BayesFit for parameter estimation by Bayesian inference. The eSS and VNS methods can be run on a single-thread or in parallel using a cooperative strategy. The code is supplied under GPLv3 and is available at http://www.iim.csic.es/~gingproc/meigo.html. Documentation and examples are included. The R package has been submitted to BioConductor. We evaluate MEIGO against optimization benchmarks, and illustrate its applicability to a series of case studies in bioinformatics and systems biology where it outperforms other state-of-the-art methods. Conclusions MEIGO provides a free, open-source platform for optimization that can be applied to multiple domains of systems biology and bioinformatics. It includes efficient state of the art metaheuristics, and its open and modular structure allows the addition of further methods.
Collapse
|
25
|
MIDER: network inference with mutual information distance and entropy reduction. PLoS One 2014; 9:e96732. [PMID: 24806471 PMCID: PMC4013075 DOI: 10.1371/journal.pone.0096732] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 04/09/2014] [Indexed: 01/14/2023] Open
Abstract
The prediction of links among variables from a given dataset is a task referred to as network inference or reverse engineering. It is an open problem in bioinformatics and systems biology, as well as in other areas of science. Information theory, which uses concepts such as mutual information, provides a rigorous framework for addressing it. While a number of information-theoretic methods are already available, most of them focus on a particular type of problem, introducing assumptions that limit their generality. Furthermore, many of these methods lack a publicly available implementation. Here we present MIDER, a method for inferring network structures with information theoretic concepts. It consists of two steps: first, it provides a representation of the network in which the distance among nodes indicates their statistical closeness. Second, it refines the prediction of the existing links to distinguish between direct and indirect interactions and to assign directionality. The method accepts as input time-series data related to some quantitative features of the network nodes (such as e.g. concentrations, if the nodes are chemical species). It takes into account time delays between variables, and allows choosing among several definitions and normalizations of mutual information. It is general purpose: it may be applied to any type of network, cellular or otherwise. A Matlab implementation including source code and data is freely available (http://www.iim.csic.es/~gingproc/mider.html). The performance of MIDER has been evaluated on seven different benchmark problems that cover the main types of cellular networks, including metabolic, gene regulatory, and signaling. Comparisons with state of the art information–theoretic methods have demonstrated the competitive performance of MIDER, as well as its versatility. Its use does not demand any a priori knowledge from the user; the default settings and the adaptive nature of the method provide good results for a wide range of problems without requiring tuning.
Collapse
|
26
|
Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014; 11:20130505. [PMID: 24307566 PMCID: PMC3869153 DOI: 10.1098/rsif.2013.0505] [Citation(s) in RCA: 163] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/12/2013] [Indexed: 12/17/2022] Open
Abstract
The interplay of mathematical modelling with experiments is one of the central elements in systems biology. The aim of reverse engineering is to infer, analyse and understand, through this interplay, the functional and regulatory mechanisms of biological systems. Reverse engineering is not exclusive of systems biology and has been studied in different areas, such as inverse problem theory, machine learning, nonlinear physics, (bio)chemical kinetics, control theory and optimization, among others. However, it seems that many of these areas have been relatively closed to outsiders. In this contribution, we aim to compare and highlight the different perspectives and contributions from these fields, with emphasis on two key questions: (i) why are reverse engineering problems so hard to solve, and (ii) what methods are available for the particular problems arising from systems biology?
Collapse
|
27
|
Reverse engineering cellular networks with information theoretic methods. Cells 2013; 2:306-29. [PMID: 24709703 PMCID: PMC3972682 DOI: 10.3390/cells2020306] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 04/22/2013] [Accepted: 04/27/2013] [Indexed: 11/16/2022] Open
Abstract
Building mathematical models of cellular networks lies at the core of systems biology. It involves, among other tasks, the reconstruction of the structure of interactions between molecular components, which is known as network inference or reverse engineering. Information theory can help in the goal of extracting as much information as possible from the available data. A large number of methods founded on these concepts have been proposed in the literature, not only in biology journals, but in a wide range of areas. Their critical comparison is difficult due to the different focuses and the adoption of different terminologies. Here we attempt to review some of the existing information theoretic methodologies for network inference, and clarify their differences. While some of these methods have achieved notable success, many challenges remain, among which we can mention dealing with incomplete measurements, noisy data, counterintuitive behaviour emerging from nonlinear relations or feedback loops, and computational burden of dealing with large data sets.
Collapse
|
28
|
Abstract
Determining the regulation of metabolic networks at genome scale is a hard task. It has been hypothesized that biochemical pathways and metabolic networks might have undergone an evolutionary process of optimization with respect to several criteria over time. In this contribution, a multi-criteria approach has been used to optimize parameters for the allosteric regulation of enzymes in a model of a metabolic substrate-cycle. This has been carried out by calculating the Pareto set of optimal solutions according to two objectives: the proper direction of flux in a metabolic cycle and the energetic cost of applying the set of parameters. Different Pareto fronts have been calculated for eight different "environments" (specific time courses of end product concentrations). For each resulting front the so-called knee point is identified, which can be considered a preferred trade-off solution. Interestingly, the optimal control parameters corresponding to each of these points also lead to optimal behaviour in all the other environments. By calculating the average of the different parameter sets for the knee solutions more frequently found, a final and optimal consensus set of parameters can be obtained, which is an indication on the existence of a universal regulation mechanism for this system.The implications from such a universal regulatory switch are discussed in the framework of large metabolic networks.
Collapse
|
29
|
A cooperative strategy for parameter estimation in large scale systems biology models. BMC SYSTEMS BIOLOGY 2012; 6:75. [PMID: 22727112 PMCID: PMC3512509 DOI: 10.1186/1752-0509-6-75] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 06/11/2012] [Indexed: 01/03/2023]
Abstract
Background Mathematical models play a key role in systems biology: they summarize the currently available knowledge in a way that allows to make experimentally verifiable predictions. Model calibration consists of finding the parameters that give the best fit to a set of experimental data, which entails minimizing a cost function that measures the goodness of this fit. Most mathematical models in systems biology present three characteristics which make this problem very difficult to solve: they are highly non-linear, they have a large number of parameters to be estimated, and the information content of the available experimental data is frequently scarce. Hence, there is a need for global optimization methods capable of solving this problem efficiently. Results A new approach for parameter estimation of large scale models, called Cooperative Enhanced Scatter Search (CeSS), is presented. Its key feature is the cooperation between different programs (“threads”) that run in parallel in different processors. Each thread implements a state of the art metaheuristic, the enhanced Scatter Search algorithm (eSS). Cooperation, meaning information sharing between threads, modifies the systemic properties of the algorithm and allows to speed up performance. Two parameter estimation problems involving models related with the central carbon metabolism of E. coli which include different regulatory levels (metabolic and transcriptional) are used as case studies. The performance and capabilities of the method are also evaluated using benchmark problems of large-scale global optimization, with excellent results. Conclusions The cooperative CeSS strategy is a general purpose technique that can be applied to any model calibration problem. Its capability has been demonstrated by calibrating two large-scale models of different characteristics, improving the performance of previously existing methods in both cases. The cooperative metaheuristic presented here can be easily extended to incorporate other global and local search solvers and specific structural information for particular classes of problems.
Collapse
|
30
|
Use of a generalized fisher equation for global optimization in chemical kinetics. J Phys Chem A 2011; 115:8426-36. [PMID: 21711023 DOI: 10.1021/jp203158r] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
A new approach for parameter estimation in chemical kinetics has been recently proposed (Ross et al. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 12777). It makes use of an optimization criterion based on a Generalized Fisher Equation (GFE). Its utility has been demonstrated with two reaction mechanisms, the chlorite-iodide and Oregonator, which are computationally stiff systems. In this Article, the performance of the GFE-based algorithm is compared to that obtained from minimization of the squared distances between the observed and predicted concentrations obtained by solving the corresponding initial value problem (we call this latter approach "traditional" for simplicity). Comparison of the proposed GFE-based optimization method with the "traditional" one has revealed their differences in performance. This difference can be seen as a trade-off between speed (which favors GFE) and accuracy (which favors the traditional method). The chlorite-iodide and Oregonator systems are again chosen as case studies. An identifiability analysis is performed for both of them, followed by an optimal experimental design based on the Fisher Information Matrix (FIM). This allows to identify and overcome most of the previously encountered identifiability issues, improving the estimation accuracy. With the new data, obtained from optimally designed experiments, it is now possible to estimate effectively more parameters than with the previous data. This result, which holds for both GFE-based and traditional methods, stresses the importance of an appropriate experimental design. Finally, a new hybrid method that combines advantages from the GFE and traditional approaches is presented.
Collapse
|