51
|
Abstract
Biochemical systems theory (BST) is the foundation for a set of analytical andmodeling tools that facilitate the analysis of dynamic biological systems. This paper depicts major developments in BST up to the current state of the art in 2012. It discusses its rationale, describes the typical strategies and methods of designing, diagnosing, analyzing, and utilizing BST models, and reviews areas of application. The paper is intended as a guide for investigators entering the fascinating field of biological systems analysis and as a resource for practitioners and experts.
Collapse
|
52
|
Intervention in Biological Phenomena via Feedback Linearization. Adv Bioinformatics 2012; 2012:534810. [PMID: 23209459 PMCID: PMC3502753 DOI: 10.1155/2012/534810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Accepted: 10/10/2012] [Indexed: 11/17/2022] Open
Abstract
The problems of modeling and intervention of biological phenomena have captured the interest of many researchers in the past few decades. The aim of the therapeutic intervention strategies is to move an undesirable state of a diseased network towards a more desirable one. Such an objective can be achieved by the application of drugs to act on some genes/metabolites that experience the undesirable behavior. For the purpose of design and analysis of intervention strategies, mathematical models that can capture the complex dynamics of the biological systems are needed. S-systems, which offer a good compromise between accuracy and mathematical flexibility, are a promising framework for modeling the dynamical behavior of biological phenomena. Due to the complex nonlinear dynamics of the biological phenomena represented by S-systems, nonlinear intervention schemes are needed to cope with the complexity of the nonlinear S-system models. Here, we present an intervention technique based on feedback linearization for biological phenomena modeled by S-systems. This technique is based on perfect knowledge of the S-system model. The proposed intervention technique is applied to the glycolytic-glycogenolytic pathway, and simulation results presented demonstrate the effectiveness of the proposed technique.
Collapse
|
53
|
Nounou MN, Nounou HN, Meskin N, Datta A, Dougherty ER. Multiscale denoising of biological data: a comparative analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1539-1544. [PMID: 22566476 DOI: 10.1109/tcbb.2012.67] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Measured microarray genomic and metabolic data are a rich source of information about the biological systems they represent. For example, time-series biological data can be used to construct dynamic genetic regulatory network models, which can be used to design intervention strategies to cure or manage major diseases. Also, copy number data can be used to determine the locations and extent of aberrations in chromosome sequences. Unfortunately, measured biological data are usually contaminated with errors that mask the important features in the data. Therefore, these noisy measurements need to be filtered to enhance their usefulness in practice. Wavelet-based multiscale filtering has been shown to be a powerful denoising tool. In this work, different batch as well as online multiscale filtering techniques are used to denoise biological data contaminated with white or colored noise. The performances of these techniques are demonstrated and compared to those of some conventional low-pass filters using two case studies. The first case study uses simulated dynamic metabolic data, while the second case study uses real copy number data. Simulation results show that significant improvement can be achieved using multiscale filtering over conventional filtering techniques.
Collapse
Affiliation(s)
- M N Nounou
- Chemical Engineering Program, Texas A&M University at Qatar, Doha, Qatar.
| | | | | | | | | |
Collapse
|
54
|
Abstract
A metabolism is a complex network of chemical reactions that converts sources of energy and chemical elements into biomass and other molecules. To design a metabolism from scratch and to implement it in a synthetic genome is almost within technological reach. Ideally, a synthetic metabolism should be able to synthesize a desired spectrum of molecules at a high rate, from multiple different nutrients, while using few chemical reactions, and producing little or no waste. Not all of these properties are achievable simultaneously. We here use a recently developed technique to create random metabolic networks with pre-specified properties to quantify trade-offs between these and other properties. We find that for every additional molecule to be synthesized a network needs on average three additional reactions. For every additional carbon source to be utilized, it needs on average two additional reactions. Networks able to synthesize 20 biomass molecules from each of 20 alternative sole carbon sources need to have at least 260 reactions. This number increases to 518 reactions for networks that can synthesize more than 60 molecules from each of 80 carbon sources. The maximally achievable rate of biosynthesis decreases by approximately 5 percent for every additional molecule to be synthesized. Biochemically related molecules can be synthesized at higher rates, because their synthesis produces less waste. Overall, the variables we study can explain 87 percent of variation in network size and 84 percent of the variation in synthesis rate. The constraints we identify prescribe broad boundary conditions that can help to guide synthetic metabolism design.
Collapse
Affiliation(s)
- Tugce Bilgin
- Institute of Evolutionary Biology and Environmental Sciences, University of Zurich, Zürich, Switzerland.
| | | |
Collapse
|
55
|
Yang X, Dent JE, Nardini C. An S-System Parameter Estimation Method (SPEM) for biological networks. J Comput Biol 2012; 19:175-87. [PMID: 22300319 DOI: 10.1089/cmb.2011.0269] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Advances in experimental biology, coupled with advances in computational power, bring new challenges to the interdisciplinary field of computational biology. One such broad challenge lies in the reverse engineering of gene networks, and goes from determining the structure of static networks, to reconstructing the dynamics of interactions from time series data. Here, we focus our attention on the latter area, and in particular, on parameterizing a dynamic network of oriented interactions between genes. By basing the parameterizing approach on a known power-law relationship model between connected genes (S-system), we are able to account for non-linearity in the network, without compromising the ability to analyze network characteristics. In this article, we introduce the S-System Parameter Estimation Method (SPEM). SPEM, a freely available R software package (http://www.picb.ac.cn/ClinicalGenomicNTW/temp3.html), takes gene expression data in time series and returns the network of interactions as a set of differential equations. The methods, which are presented and tested here, are shown to provide accurate results not only on synthetic data, but more importantly on real and therefore noisy by nature, biological data. In summary, SPEM shows high sensitivity and positive predicted values, as well as free availability and expansibility (because based on open source software). We expect these characteristics to make it a useful and broadly applicable software in the challenging reconstruction of dynamic gene networks.
Collapse
Affiliation(s)
- Xinyi Yang
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, PR China
| | | | | |
Collapse
|
56
|
Nabavi S, Williams CM. A novel cost function to estimate parameters of oscillatory biochemical systems. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2012; 2012:3. [PMID: 22587221 PMCID: PMC3384360 DOI: 10.1186/1687-4153-2012-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 05/16/2012] [Indexed: 11/23/2022]
Abstract
Oscillatory pathways are among the most important classes of biochemical systems with examples ranging from circadian rhythms and cell cycle maintenance. Mathematical modeling of these highly interconnected biochemical networks is needed to meet numerous objectives such as investigating, predicting and controlling the dynamics of these systems. Identifying the kinetic rate parameters is essential for fully modeling these and other biological processes. These kinetic parameters, however, are not usually available from measurements and most of them have to be estimated by parameter fitting techniques. One of the issues with estimating kinetic parameters in oscillatory systems is the irregularities in the least square (LS) cost function surface used to estimate these parameters, which is caused by the periodicity of the measurements. These irregularities result in numerous local minima, which limit the performance of even some of the most robust global optimization algorithms. We proposed a parameter estimation framework to address these issues that integrates temporal information with periodic information embedded in the measurements used to estimate these parameters. This periodic information is used to build a proposed cost function with better surface properties leading to fewer local minima and better performance of global optimization algorithms. We verified for three oscillatory biochemical systems that our proposed cost function results in an increased ability to estimate accurate kinetic parameters as compared to the traditional LS cost function. We combine this cost function with an improved noise removal approach that leverages periodic characteristics embedded in the measurements to effectively reduce noise. The results provide strong evidence on the efficacy of this noise removal approach over the previous commonly used wavelet hard-thresholding noise removal methods. This proposed optimization framework results in more accurate kinetic parameters that will eventually lead to biochemical models that are more precise, predictable, and controllable.
Collapse
Affiliation(s)
- Seyedbehzad Nabavi
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, USA.
| | | |
Collapse
|
57
|
Acharya LR, Judeh T, Wang G, Zhu D. Optimal structural inference of signaling pathways from unordered and overlapping gene sets. Bioinformatics 2012; 28:546-56. [PMID: 22199386 PMCID: PMC3278757 DOI: 10.1093/bioinformatics/btr696] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Revised: 11/16/2011] [Accepted: 12/18/2011] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A plethora of bioinformatics analysis has led to the discovery of numerous gene sets, which can be interpreted as discrete measurements emitted from latent signaling pathways. Their potential to infer signaling pathway structures, however, has not been sufficiently exploited. Existing methods accommodating discrete data do not explicitly consider signal cascading mechanisms that characterize a signaling pathway. Novel computational methods are thus needed to fully utilize gene sets and broaden the scope from focusing only on pairwise interactions to the more general cascading events in the inference of signaling pathway structures. RESULTS We propose a gene set based simulated annealing (SA) algorithm for the reconstruction of signaling pathway structures. A signaling pathway structure is a directed graph containing up to a few hundred nodes and many overlapping signal cascades, where each cascade represents a chain of molecular interactions from the cell surface to the nucleus. Gene sets in our context refer to discrete sets of genes participating in signal cascades, the basic building blocks of a signaling pathway, with no prior information about gene orderings in the cascades. From a compendium of gene sets related to a pathway, SA aims to search for signal cascades that characterize the optimal signaling pathway structure. In the search process, the extent of overlap among signal cascades is used to measure the optimality of a structure. Throughout, we treat gene sets as random samples from a first-order Markov chain model. We evaluated the performance of SA in three case studies. In the first study conducted on 83 KEGG pathways, SA demonstrated a significantly better performance than Bayesian network methods. Since both SA and Bayesian network methods accommodate discrete data, use a 'search and score' network learning strategy and output a directed network, they can be compared in terms of performance and computational time. In the second study, we compared SA and Bayesian network methods using four benchmark datasets from DREAM. In our final study, we showcased two context-specific signaling pathways activated in breast cancer. AVAILABILITY Source codes are available from http://dl.dropbox.com/u/16000775/sa_sc.zip.
Collapse
Affiliation(s)
- Lipi R Acharya
- Department of Computer Science, University of New Orleans, New Orleans, LA 70148, USA
| | | | | | | |
Collapse
|
58
|
Kimura S, Araki D, Matsumura K, Okada-Hatakeyama M. Inference of S-system models of genetic networks by solving one-dimensional function optimization problems. Math Biosci 2012; 235:161-70. [PMID: 22155075 DOI: 10.1016/j.mbs.2011.11.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Revised: 10/21/2011] [Accepted: 11/22/2011] [Indexed: 11/17/2022]
Affiliation(s)
- S Kimura
- Graduate School of Engineering, Tottori University, 4-101, Koyama-minami, Tottori 680-8552, Japan.
| | | | | | | |
Collapse
|
59
|
Sun J, Garibaldi JM, Hodgman C. Parameter estimation using meta-heuristics in systems biology: a comprehensive review. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:185-202. [PMID: 21464505 DOI: 10.1109/tcbb.2011.63] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
This paper gives a comprehensive review of the application of meta-heuristics to optimization problems in systems biology, mainly focussing on the parameter estimation problem (also called the inverse problem or model calibration). It is intended for either the system biologist who wishes to learn more about the various optimization techniques available and/or the meta-heuristic optimizer who is interested in applying such techniques to problems in systems biology. First, the parameter estimation problems emerging from different areas of systems biology are described from the point of view of machine learning. Brief descriptions of various meta-heuristics developed for these problems follow, along with outlines of their advantages and disadvantages. Several important issues in applying meta-heuristics to the systems biology modelling problem are addressed, including the reliability and identifiability of model parameters, optimal design of experiments, and so on. Finally, we highlight some possible future research directions in this field.
Collapse
|
60
|
KIMURA SHUHEI, SHIRAISHI YUICHI, OKADA MARIKO. INFERENCE OF GENETIC NETWORKS USING LPMs: ASSESSMENT OF CONFIDENCE VALUES OF REGULATIONS. J Bioinform Comput Biol 2011. [DOI: 10.1142/s0219720010004859] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
When we apply inference methods based on a set of differential equations into actual genetic network inference problems, we often end up with a large number of false-positive regulations. However, as we must check the inferred regulations through biochemical experiments, fewer false-positive regulations are preferable. In order to reduce the number of regulations checked, this study proposes a new method that assigns confidence values to all of the regulations contained in the target network. For this purpose, we combine a residual bootstrap method with the existing method, i.e. the inference method using linear programming machines (LPMs). Through numerical experiments on an artificial genetic network inference problem, we confirmed that most of the regulations with high confidence values are actually present in the target networks. We then used the proposed method to analyze the bacterial SOS DNA repair system, and succeeded in assigning reasonable confidence values to its regulations. Although this study combined the bootstrap method with the inference method using the LPMs, the proposed bootstrap approach could be combined with any method that has an ability to infer a genetic network from time-series of gene expression levels.
Collapse
Affiliation(s)
- SHUHEI KIMURA
- Graduate School of Engineering, Tottori University, 4-101, Koyama-minami, Tottori 680-8552, Japan
| | - YUICHI SHIRAISHI
- Research Center for Allergy and Immunology, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama 230-0045, Japan
| | - MARIKO OKADA
- Research Center for Allergy and Immunology, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama 230-0045, Japan
| |
Collapse
|
61
|
Aitken S, Alexander RD, Beggs JD. Modelling reveals kinetic advantages of co-transcriptional splicing. PLoS Comput Biol 2011; 7:e1002215. [PMID: 22022255 PMCID: PMC3192812 DOI: 10.1371/journal.pcbi.1002215] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Accepted: 08/16/2011] [Indexed: 01/21/2023] Open
Abstract
Messenger RNA splicing is an essential and complex process for the removal of intron sequences. Whereas the composition of the splicing machinery is mostly known, the kinetics of splicing, the catalytic activity of splicing factors and the interdependency of transcription, splicing and mRNA 3′ end formation are less well understood. We propose a stochastic model of splicing kinetics that explains data obtained from high-resolution kinetic analyses of transcription, splicing and 3′ end formation during induction of an intron-containing reporter gene in budding yeast. Modelling reveals co-transcriptional splicing to be the most probable and most efficient splicing pathway for the reporter transcripts, due in part to a positive feedback mechanism for co-transcriptional second step splicing. Model comparison is used to assess the alternative representations of reactions. Modelling also indicates the functional coupling of transcription and splicing, because both the rate of initiation of transcription and the probability that step one of splicing occurs co-transcriptionally are reduced, when the second step of splicing is abolished in a mutant reporter. The coding information for the synthesis of proteins in mammalian cells is first transcribed from DNA to messenger RNA (mRNA), before being translated from mRNA to protein. Each step is complex, and subject to regulation. Certain sequences of DNA must be skipped in order to generate a functional protein, and these sequences, known as introns, are removed from the mRNA by the process of splicing. Splicing is well understood in terms of the proteins and complexes that are involved, but the rates of reactions, and models for the splicing pathways, have not yet been established. We present a model of splicing in yeast that accounts for the possibilities that splicing may take place while the mRNA is in the process of being created, as well as the possibility that splicing takes place once mRNA transcription is complete. We assign rates to the reactions in the pathway, and show that co-transcriptional splicing is the preferred pathway. In order to reach these conclusions, we compare a number of alternative models by a quantitative computational method. Our analysis relies on the quantitative measurement of messenger RNA in live cells - this is a major challenge in itself that has only recently been addressed.
Collapse
Affiliation(s)
- Stuart Aitken
- Centre for Systems Biology, University of Edinburgh, Edinburgh, United Kingdom.
| | | | | |
Collapse
|
62
|
Kentzoglanakis K, Poole M. A swarm intelligence framework for reconstructing gene networks: searching for biologically plausible architectures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 9:358-371. [PMID: 21576756 DOI: 10.1109/tcbb.2011.87] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In this paper, we investigate the problem of reverse engineering the topology of gene regulatory networks from temporal gene expression data. We adopt a computational intelligence approach comprising swarm intelligence techniques, namely particle swarm optimization (PSO) and ant colony optimization (ACO). In addition, the recurrent neural network (RNN) formalism is employed for modeling the dynamical behavior of gene regulatory systems. More specifically, ACO is used for searching the discrete space of network architectures and PSO for searching the corresponding continuous space of RNN model parameters. We propose a novel solution construction process in the context of ACO for generating biologically plausible candidate architectures. The objective is to concentrate the search effort into areas of the structure space that contain architectures which are feasible in terms of their topological resemblance to real-world networks. The proposed framework is initially applied to the reconstruction of a small artificial network that has previously been studied in the context of gene network reverse engineering. Subsequently, we consider an artificial data set with added noise for reconstructing a subnetwork of the genetic interaction network of S. cerevisiae (yeast). Finally, the framework is applied to a real-world data set for reverse engineering the SOS response system of the bacterium Escherichia coli. Results demonstrate the relative advantage of utilizing problem-specific knowledge regarding biologically plausible structural properties of gene networks over conducting a problem-agnostic search in the vast space of network architectures.
Collapse
|
63
|
Zhan C, Yeung LF. Parameter estimation in systems biology models using spline approximation. BMC SYSTEMS BIOLOGY 2011; 5:14. [PMID: 21255466 PMCID: PMC3750107 DOI: 10.1186/1752-0509-5-14] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Accepted: 01/24/2011] [Indexed: 11/24/2022]
Abstract
Background Mathematical models for revealing the dynamics and interactions properties of biological systems play an important role in computational systems biology. The inference of model parameter values from time-course data can be considered as a "reverse engineering" process and is still one of the most challenging tasks. Many parameter estimation methods have been developed but none of these methods is effective for all cases and can overwhelm all other approaches. Instead, various methods have their advantages and disadvantages. It is worth to develop parameter estimation methods which are robust against noise, efficient in computation and flexible enough to meet different constraints. Results Two parameter estimation methods of combining spline theory with Linear Programming (LP) and Nonlinear Programming (NLP) are developed. These methods remove the need for ODE solvers during the identification process. Our analysis shows that the augmented cost function surfaces used in the two proposed methods are smoother; which can ease the optima searching process and hence enhance the robustness and speed of the search algorithm. Moreover, the cores of our algorithms are LP and NLP based, which are flexible and consequently additional constraints can be embedded/removed easily. Eight system biology models are used for testing the proposed approaches. Our results confirm that the proposed methods are both efficient and robust. Conclusions The proposed approaches have general application to identify unknown parameter values of a wide range of systems biology models.
Collapse
Affiliation(s)
- Choujun Zhan
- Department of Electronic Engineering, City University of Hong Kong, PR China.
| | | |
Collapse
|
64
|
Meskin N, Nounou HN, Nounou M, Datta A, Dougherty ER. Intervention in biological phenomena modeled by S-systems. IEEE Trans Biomed Eng 2010; 58:1260-7. [PMID: 21172748 DOI: 10.1109/tbme.2010.2099658] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Recent years have witnessed extensive research activity in modeling biological phenomena as well as in developing intervention strategies for such phenomena. S-systems, which offer a good compromise between accuracy and mathematical flexibility, are a promising framework for modeling the dynamical behavior of biological phenomena. In this paper, two different intervention strategies, namely direct and indirect, are proposed for the S-system model. In the indirect approach, the prespecified desired values for the target variables are used to compute the reference values for the control inputs, and two control algorithms, namely simple sampled-data control and model predictive control (MPC), are developed for transferring the control variables from their initial values to the computed reference ones. In the direct approach, a MPC algorithm is developed that directly guides the target variables to their desired values. The proposed intervention strategies are applied to the glycolytic-glycogenolytic pathway and the simulation results presented demonstrate the effectiveness of the proposed schemes.
Collapse
Affiliation(s)
- Nader Meskin
- Department of Electrical Engineering, Qatar University, Doha 2713, Qatar.
| | | | | | | | | |
Collapse
|
65
|
Liu L, Agren R, Bordel S, Nielsen J. Use of genome-scale metabolic models for understanding microbial physiology. FEBS Lett 2010; 584:2556-64. [PMID: 20420838 DOI: 10.1016/j.febslet.2010.04.052] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2010] [Revised: 04/18/2010] [Accepted: 04/20/2010] [Indexed: 11/17/2022]
Abstract
The exploitation of microorganisms in industrial, medical, food and environmental biotechnology requires a comprehensive understanding of their physiology. The availability of genome sequences and accumulation of high-throughput data allows gaining understanding of microbial physiology at the systems level, and genome-scale metabolic models represent a valuable framework for integrative analysis of metabolism of microorganisms. Genome-scale metabolic models are reconstructed based on a combination of genome sequence information and detailed biochemical information, and these reconstructed models can be used for analyzing and simulating the operation of metabolism in response to different stimuli. Here we discuss the requirement for having detailed physiological insight in order to exploit microorganisms for production of fuels, chemicals and pharmaceuticals. We further describe the reconstruction process of genome-scale metabolic models and different algorithms that can be used to apply these models to gain improved insight into microbial physiology.
Collapse
Affiliation(s)
- Liming Liu
- Department of Chemical and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | | | | | | |
Collapse
|
66
|
Boykin ER, Ogle WO. Using heterogeneous data sources in a systems biology approach to modeling the Sonic Hedgehog signaling pathway. MOLECULAR BIOSYSTEMS 2010; 6:1993-2003. [DOI: 10.1039/c0mb00006j] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
67
|
Liu PK, Wang FS. Hybrid differential evolution with geometric mean mutation in parameter estimation of bioreaction systems with large parameter search space. Comput Chem Eng 2009. [DOI: 10.1016/j.compchemeng.2009.05.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
68
|
Chou IC, Voit EO. Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Math Biosci 2009; 219:57-83. [PMID: 19327372 PMCID: PMC2693292 DOI: 10.1016/j.mbs.2009.03.002] [Citation(s) in RCA: 298] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2008] [Revised: 03/06/2009] [Accepted: 03/15/2009] [Indexed: 01/16/2023]
Abstract
The organization, regulation and dynamical responses of biological systems are in many cases too complex to allow intuitive predictions and require the support of mathematical modeling for quantitative assessments and a reliable understanding of system functioning. All steps of constructing mathematical models for biological systems are challenging, but arguably the most difficult task among them is the estimation of model parameters and the identification of the structure and regulation of the underlying biological networks. Recent advancements in modern high-throughput techniques have been allowing the generation of time series data that characterize the dynamics of genomic, proteomic, metabolic, and physiological responses and enable us, at least in principle, to tackle estimation and identification tasks using 'top-down' or 'inverse' approaches. While the rewards of a successful inverse estimation or identification are great, the process of extracting structural and regulatory information is technically difficult. The challenges can generally be categorized into four areas, namely, issues related to the data, the model, the mathematical structure of the system, and the optimization and support algorithms. Many recent articles have addressed inverse problems within the modeling framework of Biochemical Systems Theory (BST). BST was chosen for these tasks because of its unique structural flexibility and the fact that the structure and regulation of a biological system are mapped essentially one-to-one onto the parameters of the describing model. The proposed methods mainly focused on various optimization algorithms, but also on support techniques, including methods for circumventing the time consuming numerical integration of systems of differential equations, smoothing overly noisy data, estimating slopes of time series, reducing the complexity of the inference task, and constraining the parameter search space. Other methods targeted issues of data preprocessing, detection and amelioration of model redundancy, and model-free or model-based structure identification. The total number of proposed methods and their applications has by now exceeded one hundred, which makes it difficult for the newcomer, as well as the expert, to gain a comprehensive overview of available algorithmic options and limitations. To facilitate the entry into the field of inverse modeling within BST and related modeling areas, the article presented here reviews the field and proposes an operational 'work-flow' that guides the user through the estimation process, identifies possibly problematic steps, and suggests corresponding solutions based on the specific characteristics of the various available algorithms. The article concludes with a discussion of the present state of the art and with a description of open questions.
Collapse
Affiliation(s)
- I-Chun Chou
- Integrative BioSystems Institute and The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA.
| | | |
Collapse
|
69
|
Ko CL, Voit EO, Wang FS. Estimating parameters for generalized mass action models with connectivity information. BMC Bioinformatics 2009; 10:140. [PMID: 19432964 PMCID: PMC2694188 DOI: 10.1186/1471-2105-10-140] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 05/11/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Determining the parameters of a mathematical model from quantitative measurements is the main bottleneck of modelling biological systems. Parameter values can be estimated from steady-state data or from dynamic data. The nature of suitable data for these two types of estimation is rather different. For instance, estimations of parameter values in pathway models, such as kinetic orders, rate constants, flux control coefficients or elasticities, from steady-state data are generally based on experiments that measure how a biochemical system responds to small perturbations around the steady state. In contrast, parameter estimation from dynamic data requires time series measurements for all dependent variables. Almost no literature has so far discussed the combined use of both steady-state and transient data for estimating parameter values of biochemical systems. RESULTS In this study we introduce a constrained optimization method for estimating parameter values of biochemical pathway models using steady-state information and transient measurements. The constraints are derived from the flux connectivity relationships of the system at the steady state. Two case studies demonstrate the estimation results with and without flux connectivity constraints. The unconstrained optimal estimates from dynamic data may fit the experiments well, but they do not necessarily maintain the connectivity relationships. As a consequence, individual fluxes may be misrepresented, which may cause problems in later extrapolations. By contrast, the constrained estimation accounting for flux connectivity information reduces this misrepresentation and thereby yields improved model parameters. CONCLUSION The method combines transient metabolic profiles and steady-state information and leads to the formulation of an inverse parameter estimation task as a constrained optimization problem. Parameter estimation and model selection are simultaneously carried out on the constrained optimization problem and yield realistic model parameters that are more likely to hold up in extrapolations with the model.
Collapse
Affiliation(s)
- Chih-Lung Ko
- Department of Chemical Engineering, National Chung Cheng University, Chiayi, Taiwan.
| | | | | |
Collapse
|
70
|
Vilela M, Vinga S, Maia MAGM, Voit EO, Almeida JS. Identification of neutral biochemical network models from time series data. BMC SYSTEMS BIOLOGY 2009. [PMID: 19416537 DOI: 10.1186/1752‐0509‐3‐47] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND The major difficulty in modeling biological systems from multivariate time series is the identification of parameter sets that endow a model with dynamical behaviors sufficiently similar to the experimental data. Directly related to this parameter estimation issue is the task of identifying the structure and regulation of ill-characterized systems. Both tasks are simplified if the mathematical model is canonical, i.e., if it is constructed according to strict guidelines. RESULTS In this report, we propose a method for the identification of admissible parameter sets of canonical S-systems from biological time series. The method is based on a Monte Carlo process that is combined with an improved version of our previous parameter optimization algorithm. The method maps the parameter space into the network space, which characterizes the connectivity among components, by creating an ensemble of decoupled S-system models that imitate the dynamical behavior of the time series with sufficient accuracy. The concept of sloppiness is revisited in the context of these S-system models with an exploration not only of different parameter sets that produce similar dynamical behaviors but also different network topologies that yield dynamical similarity. CONCLUSION The proposed parameter estimation methodology was applied to actual time series data from the glycolytic pathway of the bacterium Lactococcus lactis and led to ensembles of models with different network topologies. In parallel, the parameter optimization algorithm was applied to the same dynamical data upon imposing a pre-specified network topology derived from prior biological knowledge, and the results from both strategies were compared. The results suggest that the proposed method may serve as a powerful exploration tool for testing hypotheses and the design of new experiments.
Collapse
Affiliation(s)
- Marco Vilela
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Apartado, Oeiras, Portugal.
| | | | | | | | | |
Collapse
|
71
|
Vilela M, Vinga S, Maia MAGM, Voit EO, Almeida JS. Identification of neutral biochemical network models from time series data. BMC SYSTEMS BIOLOGY 2009; 3:47. [PMID: 19416537 PMCID: PMC2694766 DOI: 10.1186/1752-0509-3-47] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2009] [Accepted: 05/05/2009] [Indexed: 12/02/2022]
Abstract
Background The major difficulty in modeling biological systems from multivariate time series is the identification of parameter sets that endow a model with dynamical behaviors sufficiently similar to the experimental data. Directly related to this parameter estimation issue is the task of identifying the structure and regulation of ill-characterized systems. Both tasks are simplified if the mathematical model is canonical, i.e., if it is constructed according to strict guidelines. Results In this report, we propose a method for the identification of admissible parameter sets of canonical S-systems from biological time series. The method is based on a Monte Carlo process that is combined with an improved version of our previous parameter optimization algorithm. The method maps the parameter space into the network space, which characterizes the connectivity among components, by creating an ensemble of decoupled S-system models that imitate the dynamical behavior of the time series with sufficient accuracy. The concept of sloppiness is revisited in the context of these S-system models with an exploration not only of different parameter sets that produce similar dynamical behaviors but also different network topologies that yield dynamical similarity. Conclusion The proposed parameter estimation methodology was applied to actual time series data from the glycolytic pathway of the bacterium Lactococcus lactis and led to ensembles of models with different network topologies. In parallel, the parameter optimization algorithm was applied to the same dynamical data upon imposing a pre-specified network topology derived from prior biological knowledge, and the results from both strategies were compared. The results suggest that the proposed method may serve as a powerful exploration tool for testing hypotheses and the design of new experiments.
Collapse
Affiliation(s)
- Marco Vilela
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Apartado, Oeiras, Portugal.
| | | | | | | | | |
Collapse
|
72
|
Kimura S, Nakayama S, Hatakeyama M. Genetic network inference as a series of discrimination tasks. ACTA ACUST UNITED AC 2009; 25:918-25. [PMID: 19189976 DOI: 10.1093/bioinformatics/btp072] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Genetic network inference methods based on sets of differential equations generally require a great deal of time, as the equations must be solved many times. To reduce the computational cost, researchers have proposed other methods for inferring genetic networks by solving sets of differential equations only a few times, or even without solving them at all. When we try to obtain reasonable network models using these methods, however, we must estimate the time derivatives of the gene expression levels with great precision. In this study, we propose a new method to overcome the drawbacks of inference methods based on sets of differential equations. RESULTS Our method infers genetic networks by obtaining classifiers capable of predicting the signs of the derivatives of the gene expression levels. For this purpose, we defined a genetic network inference problem as a series of discrimination tasks, then solved the defined series of discrimination tasks with a linear programming machine. Our experimental results demonstrated that the proposed method is capable of correctly inferring genetic networks, and doing so more than 500 times faster than the other inference methods based on sets of differential equations. Next, we applied our method to actual expression data of the bacterial SOS DNA repair system. And finally, we demonstrated that our approach relates to the inference method based on the S-system model. Though our method provides no estimation of the kinetic parameters, it should be useful for researchers interested only in the network structure of a target system. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuhei Kimura
- Graduate School of Engineering, Tottori University, Koyama-minami, Tottori, Japan.
| | | | | |
Collapse
|
73
|
Gennemark P, Wedelin D. Benchmarks for identification of ordinary differential equations from time series data. ACTA ACUST UNITED AC 2009; 25:780-6. [PMID: 19176548 PMCID: PMC2654804 DOI: 10.1093/bioinformatics/btp050] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: In recent years, the biological literature has seen a significant increase of reported methods for identifying both structure and parameters of ordinary differential equations (ODEs) from time series data. A natural way to evaluate the performance of such methods is to try them on a sufficient number of realistic test cases. However, weak practices in specifying identification problems and lack of commonly accepted benchmark problems makes it difficult to evaluate and compare different methods. Results: To enable better evaluation and comparisons between different methods, we propose how to specify identification problems as optimization problems with a model space of allowed reactions (e.g. reaction kinetics like Michaelis–Menten or S-systems), ranges for the parameters, time series data and an error function. We also define a file format for such problems. We then present a collection of more than 40 benchmark problems for ODE model identification of cellular systems. The collection includes realistic problems of different levels of difficulty w.r.t. size and quality of data. We consider both problems with simulated data from known systems, and problems with real data. Finally, we present results based on our identification algorithm for all benchmark problems. In comparison with publications on which we have based some of the benchmark problems, our approach allows all problems to be solved without the use of supercomputing. Availability: The benchmark problems are available at www.odeidentification.org Contact:peterg@chalmers.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter Gennemark
- Department of Mathematical Sciences, University of Göteborg, SE-412 96 Göteborg, Sweden.
| | | |
Collapse
|
74
|
Chen WW, Schoeberl B, Jasper PJ, Niepel M, Nielsen UB, Lauffenburger DA, Sorger PK. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol Syst Biol 2009; 5:239. [PMID: 19156131 PMCID: PMC2644173 DOI: 10.1038/msb.2008.74] [Citation(s) in RCA: 260] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2008] [Accepted: 12/03/2008] [Indexed: 01/23/2023] Open
Abstract
The ErbB signaling pathways, which regulate diverse physiological responses such as cell survival, proliferation and motility, have been subjected to extensive molecular analysis. Nonetheless, it remains poorly understood how different ligands induce different responses and how this is affected by oncogenic mutations. To quantify signal flow through ErbB-activated pathways we have constructed, trained and analyzed a mass action model of immediate-early signaling involving ErbB1-4 receptors (EGFR, HER2/Neu2, ErbB3 and ErbB4), and the MAPK and PI3K/Akt cascades. We find that parameter sensitivity is strongly dependent on the feature (e.g. ERK or Akt activation) or condition (e.g. EGF or heregulin stimulation) under examination and that this context dependence is informative with respect to mechanisms of signal propagation. Modeling predicts log-linear amplification so that significant ERK and Akt activation is observed at ligand concentrations far below the K(d) for receptor binding. However, MAPK and Akt modules isolated from the ErbB model continue to exhibit switch-like responses. Thus, key system-wide features of ErbB signaling arise from nonlinear interaction among signaling elements, the properties of which appear quite different in context and in isolation.
Collapse
Affiliation(s)
- William W Chen
- Department of Systems Biology, Center for Cell Decision Processes, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | |
Collapse
|
75
|
Goel G, Chou IC, Voit EO. System estimation from metabolic time-series data. Bioinformatics 2008; 24:2505-11. [PMID: 18772153 PMCID: PMC2732280 DOI: 10.1093/bioinformatics/btn470] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2008] [Revised: 08/27/2008] [Accepted: 08/29/2008] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION At the center of computational systems biology are mathematical models that capture the dynamics of biological systems and offer novel insights. The bottleneck in the construction of these models is presently the identification of model parameters that make the model consistent with observed data. Dynamic flux estimation (DFE) is a novel methodological framework for estimating parameters for models of metabolic systems from time-series data. DFE consists of two distinct phases, an entirely model-free and assumption-free data analysis and a model-based mathematical characterization of process representations. The model-free phase reveals inconsistencies within the data, and between data and the alleged system topology, while the model-based phase allows quantitative diagnostics of whether--or to what degree--the assumed mathematical formulations are appropriate or in need of improvement. Hallmarks of DFE are the facility to: diagnose data and model consistency; circumvent undue compensation of errors; determine functional representations of fluxes uncontaminated by errors in other fluxes and pinpoint sources of remaining errors. Our results suggest that the proposed approach is more effective and robust than presently available methods for deriving metabolic models from time-series data. Its avoidance of error compensation among process descriptions promises significantly improved extrapolability toward new data or experimental conditions.
Collapse
Affiliation(s)
- Gautam Goel
- Integrative BioSystems Institute and The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA
| | | | | |
Collapse
|
76
|
Nakatsui M, Ueda T, Maki Y, Ono I, Okamoto M. Method for inferring and extracting reliable genetic interactions from time-series profile of gene expression. Math Biosci 2008; 215:105-14. [PMID: 18638491 DOI: 10.1016/j.mbs.2008.06.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2007] [Revised: 06/11/2008] [Accepted: 06/13/2008] [Indexed: 11/16/2022]
Affiliation(s)
- Masahiko Nakatsui
- Laboratory for Bioinformatics, Graduate School of Systems Life Sciences, Kyushu University, Fukuoka 812-8581, Japan
| | | | | | | | | |
Collapse
|
77
|
Vilela M, Chou IC, Vinga S, Vasconcelos ATR, Voit EO, Almeida JS. Parameter optimization in S-system models. BMC SYSTEMS BIOLOGY 2008; 2:35. [PMID: 18416837 PMCID: PMC2333970 DOI: 10.1186/1752-0509-2-35] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2008] [Accepted: 04/16/2008] [Indexed: 11/19/2022]
Abstract
BACKGROUND The inverse problem of identifying the topology of biological networks from their time series responses is a cornerstone challenge in systems biology. We tackle this challenge here through the parameterization of S-system models. It was previously shown that parameter identification can be performed as an optimization based on the decoupling of the differential S-system equations, which results in a set of algebraic equations. RESULTS A novel parameterization solution is proposed for the identification of S-system models from time series when no information about the network topology is known. The method is based on eigenvector optimization of a matrix formed from multiple regression equations of the linearized decoupled S-system. Furthermore, the algorithm is extended to the optimization of network topologies with constraints on metabolites and fluxes. These constraints rejoin the system in cases where it had been fragmented by decoupling. We demonstrate with synthetic time series why the algorithm can be expected to converge in most cases. CONCLUSION A procedure was developed that facilitates automated reverse engineering tasks for biological networks using S-systems. The proposed method of eigenvector optimization constitutes an advancement over S-system parameter identification from time series using a recent method called Alternating Regression. The proposed method overcomes convergence issues encountered in alternate regression by identifying nonlinear constraints that restrict the search space to computationally feasible solutions. Because the parameter identification is still performed for each metabolite separately, the modularity and linear time characteristics of the alternating regression method are preserved. Simulation studies illustrate how the proposed algorithm identifies the correct network topology out of a collection of models which all fit the dynamical time series essentially equally well.
Collapse
Affiliation(s)
- Marco Vilela
- Dept. Bioinformatics and Computational Biology, University of Texas M.D. Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77030, USA
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Rua da Quinta Grande 6, Apartado 127, 2780-156 Oeiras, Portugal
| | - I-Chun Chou
- Dept. Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA
| | - Susana Vinga
- Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R. Alves Redol 9, 1000-029 Lisboa, Portugal
| | - Ana Tereza R Vasconcelos
- Dept. Computatinal and Applied Mathematics, Laboratório Nacional de Computação Científica, Petrópolis, Rio de Janeiro, Brazil
| | - Eberhard O Voit
- Dept. Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA
| | - Jonas S Almeida
- Dept. Bioinformatics and Computational Biology, University of Texas M.D. Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77030, USA
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Rua da Quinta Grande 6, Apartado 127, 2780-156 Oeiras, Portugal
| |
Collapse
|
78
|
Liu PK, Wang FS. Inference of biochemical network models in S-system using multiobjective optimization approach. ACTA ACUST UNITED AC 2008; 24:1085-92. [PMID: 18321886 DOI: 10.1093/bioinformatics/btn075] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION The inference of biochemical networks, such as gene regulatory networks, protein-protein interaction networks, and metabolic pathway networks, from time-course data is one of the main challenges in systems biology. The ultimate goal of inferred modeling is to obtain expressions that quantitatively understand every detail and principle of biological systems. To infer a realizable S-system structure, most articles have applied sums of magnitude of kinetic orders as a penalty term in the fitness evaluation. How to tune a penalty weight to yield a realizable model structure is the main issue for the inverse problem. No guideline has been published for tuning a suitable penalty weight to infer a suitable model structure of biochemical networks. RESULTS We introduce an interactive inference algorithm to infer a realizable S-system structure for biochemical networks. The inference problem is formulated as a multiobjective optimization problem to minimize simultaneously the concentration error, slope error and interaction measure in order to find a suitable S-system model structure and its corresponding model parameters. The multiobjective optimization problem is solved by the epsilon-constraint method to minimize the interaction measure subject to the expectation constraints for the concentration and slope error criteria. The theorems serve to guarantee the minimum solution for the epsilon-constrained problem to achieve the minimum interaction network for the inference problem. The approach could avoid assigning a penalty weight for sums of magnitude of kinetic orders.
Collapse
Affiliation(s)
- Pang-Kai Liu
- Department of Chemical Engineering, National Chung Cheng University, Chiayi 621-02, Taiwan, ROC
| | | |
Collapse
|
79
|
Alves R, Vilaprinyo E, Hernández-Bermejo B, Sorribas A. Mathematical formalisms based on approximated kinetic representations for modeling genetic and metabolic pathways. Biotechnol Genet Eng Rev 2008; 25:1-40. [DOI: 10.5661/bger-25-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
80
|
Froehlich H, Fellmann M, Sueltmann H, Poustka A, Beissbarth T. Large scale statistical inference of signaling pathways from RNAi and microarray data. BMC Bioinformatics 2007; 8:386. [PMID: 17937790 PMCID: PMC2241646 DOI: 10.1186/1471-2105-8-386] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2007] [Accepted: 10/15/2007] [Indexed: 11/30/2022] Open
Abstract
Background The advent of RNA interference techniques enables the selective silencing of biologically interesting genes in an efficient way. In combination with DNA microarray technology this enables researchers to gain insights into signaling pathways by observing downstream effects of individual knock-downs on gene expression. These secondary effects can be used to computationally reverse engineer features of the upstream signaling pathway. Results In this paper we address this challenging problem by extending previous work by Markowetz et al., who proposed a statistical framework to score networks hypotheses in a Bayesian manner. Our extensions go in three directions: First, we introduce a way to omit the data discretization step needed in the original framework via a calculation based on p-values instead. Second, we show how prior assumptions on the network structure can be incorporated into the scoring scheme using regularization techniques. Third and most important, we propose methods to scale up the original approach, which is limited to around 5 genes, to large scale networks. Conclusion Comparisons of these methods on artificial data are conducted. Our proposed module network is employed to infer the signaling network between 13 genes in the ER-α pathway in human MCF-7 breast cancer cells. Using a bootstrapping approach this reconstruction can be found with good statistical stability. The code for the module network inference method is available in the latest version of the R-package nem, which can be obtained from the Bioconductor homepage.
Collapse
Affiliation(s)
- Holger Froehlich
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany.
| | | | | | | | | |
Collapse
|