1
|
Oguz C, Watson LT, Baumann WT, Tyson JJ. Predicting network modules of cell cycle regulators using relative protein abundance statistics. BMC SYSTEMS BIOLOGY 2017; 11:30. [PMID: 28241833 PMCID: PMC5329933 DOI: 10.1186/s12918-017-0409-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 02/17/2017] [Indexed: 11/10/2022]
Abstract
BACKGROUND Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network. RESULTS Using differential evolution, we generate an ensemble of feasible parameter vectors that reproduce the phenotypes (viable or inviable) of wild-type yeast cells and 110 mutant strains. We use this ensemble to predict the phenotypes of 129 mutant strains for which experimental data is not available. We identify 86 novel mutants that are predicted to be viable and then rank the cell cycle proteins in terms of their contributions to cumulative variability of relative protein abundance predictions. Proteins involved in "regulation of cell size" and "regulation of G1/S transition" contribute most to predictive variability, whereas proteins involved in "positive regulation of transcription involved in exit from mitosis," "mitotic spindle assembly checkpoint" and "negative regulation of cyclin-dependent protein kinase by cyclin degradation" contribute the least. These results suggest that the statistics of these predictions may be generating patterns specific to individual network modules (START, S/G2/M, and EXIT). To test this hypothesis, we develop random forest models for predicting the network modules of cell cycle regulators using relative abundance statistics as model inputs. Predictive performance is assessed by the areas under receiver operating characteristics curves (AUC). Our models generate an AUC range of 0.83-0.87 as opposed to randomized models with AUC values around 0.50. CONCLUSIONS By using differential evolution and random forest modeling, we show that the model prediction statistics generate distinct network module-specific patterns within the cell cycle network.
Collapse
Affiliation(s)
- Cihan Oguz
- Department of Biological Sciences, Virginia Tech, Blacksburg VA, 24061, USA.
| | - Layne T Watson
- Department of Computer Science, Virginia Tech, Blacksburg VA, 24061, USA.,Department of Mathematics, Virginia Tech, Blacksburg VA, 24061, USA.,Department of Aerospace and Ocean Engineering, Virginia Tech, Blacksburg VA, 24061, USA
| | - William T Baumann
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg VA, 24061, USA
| | - John J Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg VA, 24061, USA
| |
Collapse
|
2
|
Dinh V, Rundell AE, Buzzard GT. Convergence of Griddy Gibbs sampling and other perturbed Markov chains. J STAT COMPUT SIM 2016. [DOI: 10.1080/00949655.2016.1264399] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Vu Dinh
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Ann E. Rundell
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA
| | | |
Collapse
|
3
|
Efficient Optimization of Stimuli for Model-Based Design of Experiments to Resolve Dynamical Uncertainty. PLoS Comput Biol 2015; 11:e1004488. [PMID: 26379275 PMCID: PMC4574939 DOI: 10.1371/journal.pcbi.1004488] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Accepted: 08/05/2015] [Indexed: 11/19/2022] Open
Abstract
This model-based design of experiments (MBDOE) method determines the input magnitudes of an experimental stimuli to apply and the associated measurements that should be taken to optimally constrain the uncertain dynamics of a biological system under study. The ideal global solution for this experiment design problem is generally computationally intractable because of parametric uncertainties in the mathematical model of the biological system. Others have addressed this issue by limiting the solution to a local estimate of the model parameters. Here we present an approach that is independent of the local parameter constraint. This approach is made computationally efficient and tractable by the use of: (1) sparse grid interpolation that approximates the biological system dynamics, (2) representative parameters that uniformly represent the data-consistent dynamical space, and (3) probability weights of the represented experimentally distinguishable dynamics. Our approach identifies data-consistent representative parameters using sparse grid interpolants, constructs the optimal input sequence from a greedy search, and defines the associated optimal measurements using a scenario tree. We explore the optimality of this MBDOE algorithm using a 3-dimensional Hes1 model and a 19-dimensional T-cell receptor model. The 19-dimensional T-cell model also demonstrates the MBDOE algorithm’s scalability to higher dimensions. In both cases, the dynamical uncertainty region that bounds the trajectories of the target system states were reduced by as much as 86% and 99% respectively after completing the designed experiments in silico. Our results suggest that for resolving dynamical uncertainty, the ability to design an input sequence paired with its associated measurements is particularly important when limited by the number of measurements. Many mathematical models that have been developed for biological systems are limited because the complex systems are not well understood, the parameters are not known, and available data is limited and noisy. On the other hand, experiments to support model development are limited in terms of costs and time, feasible inputs and feasible measurements. MBDOE combines the mathematical models with experiment design to strategically design optimal experiments to obtain data that will contribute to the understanding of the systems. Our approach extends current capabilities of existing MBDOE techniques to make them more useful for scientists to resolve the trajectories of the system under study. It identifies the optimal conditions for stimuli and measurements that yield the most information about the system given the practical limitations. Exploration of the input space is not a trivial extension to MBDOE methods used for determining optimal measurements due to the nonlinear nature of many biological system models. The exploration of the system dynamics elicited by different inputs requires a computationally efficient and tractable approach. Our approach plans optimal experiments to reduce dynamical uncertainty in the output of selected target states of the biological system.
Collapse
|
4
|
Umulis DM, Othmer HG. The role of mathematical models in understanding pattern formation in developmental biology. Bull Math Biol 2015; 77:817-45. [PMID: 25280665 PMCID: PMC4819020 DOI: 10.1007/s11538-014-0019-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 09/02/2014] [Indexed: 12/11/2022]
Abstract
In a Wall Street Journal article published on April 5, 2013, E. O. Wilson attempted to make the case that biologists do not really need to learn any mathematics-whenever they run into difficulty with numerical issues, they can find a technician (aka mathematician) to help them out of their difficulty. He formalizes this in Wilsons Principle No. 1: "It is far easier for scientists to acquire needed collaboration from mathematicians and statisticians than it is for mathematicians and statisticians to find scientists able to make use of their equations." This reflects a complete misunderstanding of the role of mathematics in all sciences throughout history. To Wilson, mathematics is mere number crunching, but as Galileo said long ago, "The laws of Nature are written in the language of mathematics[Formula: see text] the symbols are triangles, circles and other geometrical figures, without whose help it is impossible to comprehend a single word." Mathematics has moved beyond the geometry-based model of Galileo's time, and in a rebuttal to Wilson, E. Frenkel has pointed out the role of mathematics in synthesizing the general principles in science (Both point and counter-point are available in Wilson and Frenkel in Notices Am Math Soc 60(7):837-838, 2013). We will take this a step further and show how mathematics has been used to make new and experimentally verified discoveries in developmental biology and how mathematics is essential for understanding a problem that has puzzled experimentalists for decades-that of how organisms can scale in size. Mathematical analysis alone cannot "solve" these problems since the validation lies at the molecular level, but conversely, a growing number of questions in biology cannot be solved without mathematical analysis and modeling. Herein, we discuss a few examples of the productive intercourse between mathematics and biology.
Collapse
Affiliation(s)
- David M. Umulis
- Agricultural and Biological Engineering, Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - Hans G. Othmer
- School of Mathematics and Digital Technology Center, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
5
|
Bouffier AM, Arnold J, Schüttler HB. A MINE alternative to D-optimal designs for the linear model. PLoS One 2014; 9:e110234. [PMID: 25356931 PMCID: PMC4214713 DOI: 10.1371/journal.pone.0110234] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 09/16/2014] [Indexed: 12/04/2022] Open
Abstract
Doing large-scale genomics experiments can be expensive, and so experimenters want to get the most information out of each experiment. To this end the Maximally Informative Next Experiment (MINE) criterion for experimental design was developed. Here we explore this idea in a simplified context, the linear model. Four variations of the MINE method for the linear model were created: MINE-like, MINE, MINE with random orthonormal basis, and MINE with random rotation. Each method varies in how it maximizes the MINE criterion. Theorem 1 establishes sufficient conditions for the maximization of the MINE criterion under the linear model. Theorem 2 establishes when the MINE criterion is equivalent to the classic design criterion of D-optimality. By simulation under the linear model, we establish that the MINE with random orthonormal basis and MINE with random rotation are faster to discover the true linear relation with regression coefficients and observations when . We also establish in simulations with , , and 1000 replicates that these two variations of MINE also display a lower false positive rate than the MINE-like method and additionally, for a majority of the experiments, for the MINE method.
Collapse
Affiliation(s)
- Amanda M. Bouffier
- Institute of Bioinformatics, University of Georgia, Athens, Georgia, United States of America
| | - Jonathan Arnold
- Genetics Department, University of Georgia, Athens, Georgia, United States of America
- * E-mail:
| | - H. Bernd Schüttler
- Physics and Astronomy Department, University of Georgia, Athens, Georgia, United States of America
| |
Collapse
|
6
|
Bazil JN, Stamm KD, Li X, Thiagarajan R, Nelson TJ, Tomita-Mitchell A, Beard DA. The inferred cardiogenic gene regulatory network in the mammalian heart. PLoS One 2014; 9:e100842. [PMID: 24971943 PMCID: PMC4074065 DOI: 10.1371/journal.pone.0100842] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 05/31/2014] [Indexed: 12/22/2022] Open
Abstract
Cardiac development is a complex, multiscale process encompassing cell fate adoption, differentiation and morphogenesis. To elucidate pathways underlying this process, a recently developed algorithm to reverse engineer gene regulatory networks was applied to time-course microarray data obtained from the developing mouse heart. Approximately 200 genes of interest were input into the algorithm to generate putative network topologies that are capable of explaining the experimental data via model simulation. To cull specious network interactions, thousands of putative networks are merged and filtered to generate scale-free, hierarchical networks that are statistically significant and biologically relevant. The networks are validated with known gene interactions and used to predict regulatory pathways important for the developing mammalian heart. Area under the precision-recall curve and receiver operator characteristic curve are 9% and 58%, respectively. Of the top 10 ranked predicted interactions, 4 have already been validated. The algorithm is further tested using a network enriched with known interactions and another depleted of them. The inferred networks contained more interactions for the enriched network versus the depleted network. In all test cases, maximum performance of the algorithm was achieved when the purely data-driven method of network inference was combined with a data-independent, functional-based association method. Lastly, the network generated from the list of approximately 200 genes of interest was expanded using gene-profile uniqueness metrics to include approximately 900 additional known mouse genes and to form the most likely cardiogenic gene regulatory network. The resultant network supports known regulatory interactions and contains several novel cardiogenic regulatory interactions. The method outlined herein provides an informative approach to network inference and leads to clear testable hypotheses related to gene regulation.
Collapse
Affiliation(s)
- Jason N. Bazil
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Karl D. Stamm
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Xing Li
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Raghuram Thiagarajan
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Timothy J. Nelson
- Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, and Mayo Clinic Center for Regenerative Medicine, Rochester, Minnesota, United States of America
| | - Aoy Tomita-Mitchell
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, Wisconsin, United States of America
| | - Daniel A. Beard
- Department of Molecular and Integrative Physiology, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
7
|
Model-based analysis for qualitative data: an application in Drosophila germline stem cell regulation. PLoS Comput Biol 2014; 10:e1003498. [PMID: 24626201 PMCID: PMC3952817 DOI: 10.1371/journal.pcbi.1003498] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 01/16/2014] [Indexed: 01/17/2023] Open
Abstract
Discovery in developmental biology is often driven by intuition that relies on the integration of multiple types of data such as fluorescent images, phenotypes, and the outcomes of biochemical assays. Mathematical modeling helps elucidate the biological mechanisms at play as the networks become increasingly large and complex. However, the available data is frequently under-utilized due to incompatibility with quantitative model tuning techniques. This is the case for stem cell regulation mechanisms explored in the Drosophila germarium through fluorescent immunohistochemistry. To enable better integration of biological data with modeling in this and similar situations, we have developed a general parameter estimation process to quantitatively optimize models with qualitative data. The process employs a modified version of the Optimal Scaling method from social and behavioral sciences, and multi-objective optimization to evaluate the trade-off between fitting different datasets (e.g. wild type vs. mutant). Using only published imaging data in the germarium, we first evaluated support for a published intracellular regulatory network by considering alternative connections of the same regulatory players. Simply screening networks against wild type data identified hundreds of feasible alternatives. Of these, five parsimonious variants were found and compared by multi-objective analysis including mutant data and dynamic constraints. With these data, the current model is supported over the alternatives, but support for a biochemically observed feedback element is weak (i.e. these data do not measure the feedback effect well). When also comparing new hypothetical models, the available data do not discriminate. To begin addressing the limitations in data, we performed a model-based experiment design and provide recommendations for experiments to refine model parameters and discriminate increasingly complex hypotheses. We developed a process to quantitatively fit mathematical models using qualitative data, and applied it in the study of how stem cells are regulated in the fruit fly ovary. The available published data we collected are fluorescent images of protein and mRNA expression from genetic experiments. Despite lacking quantitative data, the new process makes available quantitative model analysis techniques to reliably compare different models and guide future experiments. We found that the current consensus regulatory model is supported, but that the data are indeed insufficient to address more complex hypotheses. With the quantitatively fit models, we evaluated hypothetical experiments and estimated which future measurements should best refine or test models. The model fitting process we have developed is applicable to many biological studies where qualitative data are common, and can accelerate progress through more efficient experimentation.
Collapse
|
8
|
Experimental design for dynamics identification of cellular processes. Bull Math Biol 2014; 76:597-626. [PMID: 24522560 DOI: 10.1007/s11538-014-9935-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 01/29/2014] [Indexed: 10/25/2022]
Abstract
We address the problem of using nonlinear models to design experiments to characterize the dynamics of cellular processes by using the approach of the Maximally Informative Next Experiment (MINE), which was introduced in W. Dong et al. (PLoS ONE 3(8):e3105, 2008) and independently in M.M. Donahue et al. (IET Syst. Biol. 4:249-262, 2010). In this approach, existing data is used to define a probability distribution on the parameters; the next measurement point is the one that yields the largest model output variance with this distribution. Building upon this approach, we introduce the Expected Dynamics Estimator (EDE), which is the expected value using this distribution of the output as a function of time. We prove the consistency of this estimator (uniform convergence to true dynamics) even when the chosen experiments cluster in a finite set of points. We extend this proof of consistency to various practical assumptions on noisy data and moderate levels of model mismatch. Through the derivation and proof, we develop a relaxed version of MINE that is more computationally tractable and robust than the original formulation. The results are illustrated with numerical examples on two nonlinear ordinary differential equation models of biomolecular and cellular processes.
Collapse
|
9
|
Mdluli T, Pargett M, Buzzard GT, Rundell AE. Specifying informative experiment stimulation conditions for resolving dynamical uncertainty in biological systems. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2014; 2014:298-301. [PMID: 25569956 DOI: 10.1109/embc.2014.6943588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
A computationally efficient model-based design of experiments (MBDOE) strategy is developed to plan an optimal experiment by specifying the experimental stimulation magnitudes and measurement points. The strategy is extended from previous work which optimized the experimental design over a space of measurable species and time points. We include system inputs (stimulation conditions) in the experiment design search to investigate if the addition of perturbations enhances the ability of the MBDOE method to resolve uncertainties in system dynamics. The MBDOE problem is made computationally tractable by using a sparse-grid approximation of the model output dynamics, pre-specifying the time points at which the input or experimental perturbations can be applied, and creating scenario trees to explore the endogenous uncertainty. Consecutive scenario trees are used to determine the best input magnitudes and select the optimal associated measurement species and time points. We demonstrate the effectiveness of this strategy on a T-Cell Receptor (TCR) signaling pathway model.
Collapse
|
10
|
Chakrabarty A, Buzzard GT, Rundell AE. Model-based design of experiments for cellular processes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2013; 5:181-203. [PMID: 23293047 DOI: 10.1002/wsbm.1204] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Ankush Chakrabarty
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
| | | | | |
Collapse
|
11
|
Noble SL, Wendel LE, Donahue MM, Buzzard GT, Rundell AE. Sparse-grid-based adaptive model predictive control of HL60 cellular differentiation. IEEE Trans Biomed Eng 2011; 59:456-63. [PMID: 22057041 DOI: 10.1109/tbme.2011.2174361] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Quantitative methods such as model-based predictive control are known to facilitate the design of strategies to manipulate biological systems. This study develops a sparse-grid-based adaptive model predictive control (MPC) strategy to direct HL60 cellular differentiation. Sparse-grid sampling and interpolation support a computationally efficient adaptive MPC scheme in which multiple data-consistent regions of the model parameter space are identified and used to calculate a control compromise. The algorithm is evaluated in silico with structural model mismatch. Simulations demonstrate how the multiscenario control strategy more effectively manages the mismatch compared to a single scenario approach. Furthermore, the controller is evaluated in vitro to differentiate HL60 cells in both normal and perturbed environments. The controller-derived input sequence successfully achieves and sustains the specified target level of granulocytes when implemented in the laboratory. The results and analysis given here imply that adoption of this experiment planning technique to direct cell differentiation within more complex tissue engineered constructs will require the use of a reasonably accurate mathematical model and an extension of this algorithm to multiobjective controller design.
Collapse
Affiliation(s)
- Sarah L Noble
- Weapons and Systems EngineeringDepartment, United States Naval Academy, Annapolis, MD 21401, USA.
| | | | | | | | | |
Collapse
|
12
|
Bazil JN, Buzzard GT, Rundell AE. A Global Parallel Model Based Design of Experiments Method to Minimize Model Output Uncertainty. Bull Math Biol 2011; 74:688-716. [DOI: 10.1007/s11538-011-9686-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2011] [Accepted: 08/04/2011] [Indexed: 01/14/2023]
|