1
|
Loskot P. A query-response causal analysis of reaction events in biochemical reaction networks. Comput Biol Chem 2024; 108:107995. [PMID: 38039799 DOI: 10.1016/j.compbiolchem.2023.107995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 11/16/2023] [Accepted: 11/27/2023] [Indexed: 12/03/2023]
Abstract
The stochastic kinetics of biochemical reaction networks is described by a chemical master equation (CME) and the underlying laws of mass action. Assuming network-free simulations of the rule-based models of biochemical reaction networks (BRNs), this paper departs from the usual analysis of network dynamics as the time-dependent distributions of chemical species counts, and instead considers statistically evaluating the sequences of reaction events generated from the stochastic simulations. The reaction event-time series can be used for reaction clustering, identifying rare events, and recognizing the periods of increased or steady-state activity. However, the main aim of this paper is to device an effective method for identifying causally and anti-causally related sub-sequences of reaction events using their empirical probabilities. This allows discovering some of the causal dynamics of BRNs as well as uncovering their short-term deterministic behaviors. In particular, it is proposed that the reaction sub-sequences that are conditionally nearly certain or nearly uncertain can be considered as being causally related. Moreover, since the time-ordering of reaction events is locally irrelevant, the reaction sub-sequences can be transformed into the reaction sets or multi-sets. The distance metrics can be then used to define the equivalences among the reaction events. The proposed method for identifying the causally related reaction sub-sequences has been implemented as a computationally efficient query-response mechanism. The method was evaluated for five models of genetic networks in seven defined numerical experiments. The models were simulated in BioNetGen using the open-source network-free simulator NFsim. This simulator had to be modified first to allow recording the traces of reaction events, and it is available in the Github repository, ploskot/nfsim_1.20. The generated event time-series were analyzed with Python and Matlab scripts. The whole process of data generation, analysis and visualization has been nearly fully automated using shell scripts. This demonstrates the opportunities for substantially increasing the research productivity by creating automated data generation and processing pipelines.
Collapse
Affiliation(s)
- Pavel Loskot
- ZJU-UIUC Institute, 314400, Haining, Zhejiang, China.
| |
Collapse
|
2
|
Foo M, Dony L, He F. Data-driven dynamical modelling of a pathogen-infected plant gene regulatory network: A comparative analysis. Biosystems 2022; 219:104732. [PMID: 35781035 DOI: 10.1016/j.biosystems.2022.104732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 05/30/2022] [Accepted: 06/22/2022] [Indexed: 11/02/2022]
Abstract
Recent advances in synthetic biology have enabled the design of genetic feedback control circuits that could be implemented to build resilient plants against pathogen attacks. To facilitate the proper design of these genetic feedback control circuits, an accurate model that is able to capture the vital dynamical behaviour of the pathogen-infected plant is required. In this study, using a data-driven modelling approach, we develop and compare four dynamical models (i.e. linear, Michaelis-Menten with Hill coefficient (Hill Function), standard S-System and extended S-System) of a pathogen-infected plant gene regulatory network (GRN). These models are then assessed across several criteria, i.e. ease of identifying the type of gene regulation, the predictive capability, Akaike Information Criterion (AIC) and the robustness to parameter uncertainty to determine its viability of balancing between biological complexity and accuracy when modelling the pathogen-infected plant GRN. Using our defined ranking score, we obtain the following insights to the modelling of GRN. Our analyses show that despite commonly used and provide biological relevance, the Hill Function model ranks the lowest while the extended S-System model ranks highest in the overall comparison. Interestingly, the performance of the linear model is more consistent throughout the comparison, making it the preferred model for this pathogen-infected plant GRN when considering data-driven modelling approach.
Collapse
Affiliation(s)
- Mathias Foo
- School of Engineering, University of Warwick, CV4 7AL, Coventry, UK.
| | - Leander Dony
- Institute of Computational Biology, Helmholtz Munich, 85764, Neuherberg, Germany; Department of Translational Psychiatry, Max Planck Institute of Psychiatry, International Max Planck Research School for Translational Psychiatry (IMPRS-TP), 80804, Munich, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany.
| | - Fei He
- Centre for Computational Science and Mathematical Modelling, Coventry University, CV1 2JH, Coventry, UK.
| |
Collapse
|
3
|
Mahmoodi SH, Aghdam R, Eslahchi C. An order independent algorithm for inferring gene regulatory network using quantile value for conditional independence tests. Sci Rep 2021; 11:7605. [PMID: 33828122 PMCID: PMC8027014 DOI: 10.1038/s41598-021-87074-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/24/2021] [Indexed: 10/31/2022] Open
Abstract
In recent years, due to the difficulty and inefficiency of experimental methods, numerous computational methods have been introduced for inferring the structure of Gene Regulatory Networks (GRNs). The Path Consistency (PC) algorithm is one of the popular methods to infer the structure of GRNs. However, this group of methods still has limitations and there is a potential for improvements in this field. For example, the PC-based algorithms are still sensitive to the ordering of nodes i.e. different node orders results in different network structures. The second is that the networks inferred by these methods are highly dependent on the threshold used for independence testing. Also, it is still a challenge to select the set of conditional genes in an optimal way, which affects the performance and computation complexity of the PC-based algorithm. We introduce a novel algorithm, namely Order Independent PC-based algorithm using Quantile value (OIPCQ), which improves the accuracy of the learning process of GRNs and solves the order dependency issue. The quantile-based thresholds are considered for different orders of CMI tests. For conditional gene selection, we consider the paths between genes with length equal or greater than 2 while other well-known PC-based methods only consider the paths of length 2. We applied OIPCQ on the various networks of the DREAM3 and DREAM4 in silico challenges. As a real-world case study, we used OIPCQ to reconstruct SOS DNA network obtained from Escherichia coli and GRN for acute myeloid leukemia based on the RNA sequencing data from The Cancer Genome Atlas. The results show that OIPCQ produces the same network structure for all the permutations of the genes and improves the resulted GRN through accurately quantifying the causal regulation strength in comparison with other well-known PC-based methods. According to the GRN constructed by OIPCQ, for acute myeloid leukemia, two regulators BCLAF1 and NRSF reported previously are significantly important. However, the highest degree nodes in this GRN are ZBTB7A and PU1 which play a significant role in cancer, especially in leukemia. OIPCQ is freely accessible at https://github.com/haammim/OIPCQ-and-OIPCQ2 .
Collapse
Affiliation(s)
- Sayyed Hadi Mahmoodi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Rosa Aghdam
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
4
|
|
5
|
Investigation of Precise Molecular Mechanistic Action of Tobacco-Associated Carcinogen `NNK´ Induced Carcinogenesis: A System Biology Approach. Genes (Basel) 2019; 10:genes10080564. [PMID: 31357510 PMCID: PMC6723528 DOI: 10.3390/genes10080564] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 07/22/2019] [Accepted: 07/24/2019] [Indexed: 12/21/2022] Open
Abstract
Cancer is the second deadliest disease listed by the WHO. One of the major causes of cancer disease is tobacco and consumption possibly due to its main component, 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK). A plethora of studies have been conducted in the past aiming to decipher the association of NNK with other diseases. However, it is strongly linked with cancer development. Despite these studies, a clear molecular mechanism and the impact of NNK on various system-level networks is not known. In the present study, system biology tools were employed to understand the key regulatory mechanisms and the perturbations that will happen in the cellular processes due to NNK. To investigate the system level influence of the carcinogen, NNK rewired protein–protein interaction network (PPIN) was generated from 544 reported proteins drawn out from 1317 articles retrieved from PubMed. The noise was removed from PPIN by the method of modulation. Gene ontology (GO) enrichment was performed on the seed proteins extracted from various modules to find the most affected pathways by the genes/proteins. For the modulation, Molecular COmplex DEtection (MCODE) was used to generate 19 modules containing 115 seed proteins. Further, scrutiny of the targeted biomolecules was done by the graph theory and molecular docking. GO enrichment analysis revealed that mostly cell cycle regulatory proteins were affected by NNK.
Collapse
|
6
|
Chan TE, Stumpf MPH, Babtie AC. Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures. Cell Syst 2019; 5:251-267.e3. [PMID: 28957658 PMCID: PMC5624513 DOI: 10.1016/j.cels.2017.08.014] [Citation(s) in RCA: 258] [Impact Index Per Article: 51.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 04/26/2017] [Accepted: 08/24/2017] [Indexed: 12/03/2022]
Abstract
While single-cell gene expression experiments present new challenges for data processing, the cell-to-cell variability observed also reveals statistical relationships that can be used by information theory. Here, we use multivariate information theory to explore the statistical dependencies between triplets of genes in single-cell gene expression datasets. We develop PIDC, a fast, efficient algorithm that uses partial information decomposition (PID) to identify regulatory relationships between genes. We thoroughly evaluate the performance of our algorithm and demonstrate that the higher-order information captured by PIDC allows it to outperform pairwise mutual information-based algorithms when recovering true relationships present in simulated data. We also infer gene regulatory networks from three experimental single-cell datasets and illustrate how network context, choices made during analysis, and sources of variability affect network inference. PIDC tutorials and open-source software for estimating PID are available. PIDC should facilitate the identification of putative functional relationships and mechanistic hypotheses from single-cell transcriptomic data. PIDC infers gene regulatory networks from single-cell transcriptomic data Multivariate information measures and context in PIDC improve network inference Heterogeneity in single-cell data carries information about gene-gene interactions Fast, efficient, open-source software is made freely available
Collapse
Affiliation(s)
- Thalia E Chan
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Michael P H Stumpf
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK; MRC London Institute of Medical Sciences, Hammersmith Campus, Imperial College London, London W12 0NN, UK.
| | - Ann C Babtie
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
7
|
Glymour C, Zhang K, Spirtes P. Review of Causal Discovery Methods Based on Graphical Models. Front Genet 2019; 10:524. [PMID: 31214249 PMCID: PMC6558187 DOI: 10.3389/fgene.2019.00524] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 05/13/2019] [Indexed: 12/11/2022] Open
Abstract
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Collapse
Affiliation(s)
- Clark Glymour
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Kun Zhang
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Peter Spirtes
- Department of Philosophy, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
8
|
Chan TE, Stumpf MPH, Babtie AC. Gene Regulatory Networks from Single Cell Data for Exploring Cell Fate Decisions. Methods Mol Biol 2019; 1975:211-238. [PMID: 31062312 DOI: 10.1007/978-1-4939-9224-9_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Single cell experimental techniques now allow us to quantify gene expression in up to thousands of individual cells. These data reveal the changes in transcriptional state that occur as cells progress through development and adopt specialized cell fates. In this chapter we describe in detail how to use our network inference algorithm (PIDC)-and the associated software package NetworkInference.jl-to infer functional interactions between genes from the observed gene expression patterns. We exploit the large sample sizes and inherent variability of single cell data to detect statistical dependencies between genes that indicate putative (co-)regulatory relationships, using multivariate information measures that can capture complex statistical relationships. We provide guidelines on how best to combine this analysis with other complementary methods designed to explore single cell data, and how to interpret the resulting gene regulatory network models to gain insight into the processes regulating cell differentiation.
Collapse
Affiliation(s)
- Thalia E Chan
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK
| | - Michael P H Stumpf
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK
| | - Ann C Babtie
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, UK.
| |
Collapse
|
9
|
Tapia M, Baudot P, Formisano-Tréziny C, Dufour MA, Temporal S, Lasserre M, Marquèze-Pouey B, Gabert J, Kobayashi K, Goaillard JM. Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons. Sci Rep 2018; 8:13637. [PMID: 30206240 PMCID: PMC6134142 DOI: 10.1038/s41598-018-31765-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 08/22/2018] [Indexed: 01/04/2023] Open
Abstract
Most neuronal types have a well-identified electrical phenotype. It is now admitted that a same phenotype can be produced using multiple biophysical solutions defined by ion channel expression levels. This argues that systems-level approaches are necessary to understand electrical phenotype genesis and stability. Midbrain dopaminergic (DA) neurons, although quite heterogeneous, exhibit a characteristic electrical phenotype. However, the quantitative genetic principles underlying this conserved phenotype remain unknown. Here we investigated the quantitative relationships between ion channels’ gene expression levels in midbrain DA neurons using single-cell microfluidic qPCR. Using multivariate mutual information analysis to decipher high-dimensional statistical dependences, we unravel co-varying gene modules that link neurotransmitter identity and electrical phenotype. We also identify new segregating gene modules underlying the diversity of this neuronal population. We propose that the newly identified genetic coupling between neurotransmitter identity and ion channels may play a homeostatic role in maintaining the electrophysiological phenotype of midbrain DA neurons.
Collapse
Affiliation(s)
- Mónica Tapia
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France
| | - Pierre Baudot
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France
| | - Christine Formisano-Tréziny
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France
| | - Martial A Dufour
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France
| | - Simone Temporal
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France
| | - Manon Lasserre
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France
| | - Béatrice Marquèze-Pouey
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France
| | - Jean Gabert
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France.,Département de Biochimie et Biologie Moléculaire, Hôpital Nord, Marseille, France
| | - Kazuto Kobayashi
- Department of Molecular Genetics, Institute of Biomedical Sciences, Fukushima Medical University, Fukushima, 960-1295, Japan
| | - Jean-Marc Goaillard
- Unité de Neurobiologie des Canaux Ioniques et de la Synapse, INSERM UMR 1072, Aix Marseille Université, 13015, Marseille, France.
| |
Collapse
|
10
|
Villaverde AF, Becker K, Banga JR. PREMER: A Tool to Infer Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1193-1202. [PMID: 28981423 DOI: 10.1109/tcbb.2017.2758786] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features-such as distinguishing between direct and indirect interactions or determining the direction of a causal link-requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end, we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux, and OSX (https://sites.google.com/site/premertoolbox/).
Collapse
|
11
|
Chase JG, Preiser JC, Dickson JL, Pironet A, Chiew YS, Pretty CG, Shaw GM, Benyo B, Moeller K, Safaei S, Tawhai M, Hunter P, Desaive T. Next-generation, personalised, model-based critical care medicine: a state-of-the art review of in silico virtual patient models, methods, and cohorts, and how to validation them. Biomed Eng Online 2018; 17:24. [PMID: 29463246 PMCID: PMC5819676 DOI: 10.1186/s12938-018-0455-y] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 02/12/2018] [Indexed: 01/17/2023] Open
Abstract
Critical care, like many healthcare areas, is under a dual assault from significantly increasing demographic and economic pressures. Intensive care unit (ICU) patients are highly variable in response to treatment, and increasingly aging populations mean ICUs are under increasing demand and their cohorts are increasingly ill. Equally, patient expectations are growing, while the economic ability to deliver care to all is declining. Better, more productive care is thus the big challenge. One means to that end is personalised care designed to manage the significant inter- and intra-patient variability that makes the ICU patient difficult. Thus, moving from current "one size fits all" protocolised care to adaptive, model-based "one method fits all" personalised care could deliver the required step change in the quality, and simultaneously the productivity and cost, of care. Computer models of human physiology are a unique tool to personalise care, as they can couple clinical data with mathematical methods to create subject-specific models and virtual patients to design new, personalised and more optimal protocols, as well as to guide care in real-time. They rely on identifying time varying patient-specific parameters in the model that capture inter- and intra-patient variability, the difference between patients and the evolution of patient condition. Properly validated, virtual patients represent the real patients, and can be used in silico to test different protocols or interventions, or in real-time to guide care. Hence, the underlying models and methods create the foundation for next generation care, as well as a tool for safely and rapidly developing personalised treatment protocols over large virtual cohorts using virtual trials. This review examines the models and methods used to create virtual patients. Specifically, it presents the models types and structures used and the data required. It then covers how to validate the resulting virtual patients and trials, and how these virtual trials can help design and optimise clinical trial. Links between these models and higher order, more complex physiome models are also discussed. In each section, it explores the progress reported up to date, especially on core ICU therapies in glycemic, circulatory and mechanical ventilation management, where high cost and frequency of occurrence provide a significant opportunity for model-based methods to have measurable clinical and economic impact. The outcomes are readily generalised to other areas of medical care.
Collapse
Affiliation(s)
- J. Geoffrey Chase
- Department of Mechanical Engineering, Centre for Bio-Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| | - Jean-Charles Preiser
- Department of Intensive Care, Erasme University of Hospital, 1070 Brussels, Belgium
| | - Jennifer L. Dickson
- Department of Mechanical Engineering, Centre for Bio-Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| | - Antoine Pironet
- GIGA In Silico Medicine, University of Liege, 4000 Liege, Belgium
| | - Yeong Shiong Chiew
- Department of Mechanical Engineering, School of Engineering, Monash University Malaysia, 47500 Selangor, Malaysia
| | - Christopher G. Pretty
- Department of Mechanical Engineering, Centre for Bio-Engineering, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| | - Geoffrey M. Shaw
- Department of Intensive Care, Christchurch Hospital, Christchurch, New Zealand
| | - Balazs Benyo
- Department of Control Engineering and Information Technology, Budapest University of Technology and Economics, Budapest, Hungary
| | - Knut Moeller
- Department of Biomedical Engineering, Institute of Technical Medicine, Furtwangen University, Villingen-Schwenningen, Germany
| | - Soroush Safaei
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Merryn Tawhai
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Peter Hunter
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Thomas Desaive
- GIGA In Silico Medicine, University of Liege, 4000 Liege, Belgium
| |
Collapse
|
12
|
Voit EO. The best models of metabolism. WILEY INTERDISCIPLINARY REVIEWS. SYSTEMS BIOLOGY AND MEDICINE 2017; 9:10.1002/wsbm.1391. [PMID: 28544810 PMCID: PMC5643013 DOI: 10.1002/wsbm.1391] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 03/31/2017] [Accepted: 04/01/2017] [Indexed: 12/25/2022]
Abstract
Biochemical systems are among of the oldest application areas of mathematical modeling. Spanning a time period of over one hundred years, the repertoire of options for structuring a model and for formulating reactions has been constantly growing, and yet, it is still unclear whether or to what degree some models are better than others and how the modeler is to choose among them. In fact, the variety of options has become overwhelming and difficult to maneuver for novices and experts alike. This review outlines the metabolic model design process and discusses the numerous choices for modeling frameworks and mathematical representations. It tries to be inclusive, even though it cannot be complete, and introduces the various modeling options in a manner that is as unbiased as that is feasible. However, the review does end with personal recommendations for the choices of default models. WIREs Syst Biol Med 2017, 9:e1391. doi: 10.1002/wsbm.1391 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Eberhard O Voit
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
13
|
Simak M, Yeang CH, Lu HHS. Exploring candidate biological functions by Boolean Function Networks for Saccharomyces cerevisiae. PLoS One 2017; 12:e0185475. [PMID: 28981547 PMCID: PMC5628832 DOI: 10.1371/journal.pone.0185475] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Accepted: 09/13/2017] [Indexed: 01/26/2023] Open
Abstract
The great amount of gene expression data has brought a big challenge for the discovery of Gene Regulatory Network (GRN). For network reconstruction and the investigation of regulatory relations, it is desirable to ensure directness of links between genes on a map, infer their directionality and explore candidate biological functions from high-throughput transcriptomic data. To address these problems, we introduce a Boolean Function Network (BFN) model based on techniques of hidden Markov model (HMM), likelihood ratio test and Boolean logic functions. BFN consists of two consecutive tests to establish links between pairs of genes and check their directness. We evaluate the performance of BFN through the application to S. cerevisiae time course data. BFN produces regulatory relations which show consistency with succession of cell cycle phases. Furthermore, it also improves sensitivity and specificity when compared with alternative methods of genetic network reverse engineering. Moreover, we demonstrate that BFN can provide proper resolution for GO enrichment of gene sets. Finally, the Boolean functions discovered by BFN can provide useful insights for the identification of control mechanisms of regulatory processes, which is the special advantage of the proposed approach. In combination with low computational complexity, BFN can serve as an efficient screening tool to reconstruct genes relations on the whole genome level. In addition, the BFN approach is also feasible to a wide range of time course datasets.
Collapse
Affiliation(s)
- Maria Simak
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
| | | | - Henry Horng-Shing Lu
- Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
- Big Data Research Center, National Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
14
|
|
15
|
|
16
|
Hu X, Wei H, Zheng H. Identification of perturbed signaling pathways from gene expression data using information divergence. MOLECULAR BIOSYSTEMS 2017; 13:1797-1804. [PMID: 28702621 DOI: 10.1039/c7mb00285h] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Abnormal regulation of signaling pathways is the key causative factor in several diseases. Although many methods have been proposed to identify significantly differential pathways between two conditions via microarray gene expression datasets, most of them concentrate on differences in the pathway components-either the differential expression or the correlation of genes in a given pathway. However, as biological functional units, signaling pathways may have diverse activity patterns across different biological contexts. In order to detect overall changes in pathways, we propose an analysis model called SPAID (Signaling Pathway Analysis based on Information Divergence). SPAID is based on the concept of information divergence, which can be used to compare two conditions by computing the differential probability distribution of the regulation capacity. We compared SPAID with several classical algorithms using different datasets, and the results indicate that SPAID produces higher repeatability, has better performance and universality, and extracts more comprehensive information regarding the underlying biological processes. In conclusion, by introducing the idea of information divergence, our study measures differences in pathways from an overall perspective and will provide a complementary analysis framework for pathway analysis.
Collapse
Affiliation(s)
- Xinying Hu
- School of Computer Science and Technology, University of Science and Technology of China, Hefei, People's Republic of China.
| | | | | |
Collapse
|
17
|
|
18
|
Yang G, Wang L, Wang X. Reconstruction of Complex Directional Networks with Group Lasso Nonlinear Conditional Granger Causality. Sci Rep 2017; 7:2991. [PMID: 28592807 PMCID: PMC5462833 DOI: 10.1038/s41598-017-02762-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 04/18/2017] [Indexed: 12/19/2022] Open
Abstract
Reconstruction of networks underlying complex systems is one of the most crucial problems in many areas of engineering and science. In this paper, rather than identifying parameters of complex systems governed by pre-defined models or taking some polynomial and rational functions as a prior information for subsequent model selection, we put forward a general framework for nonlinear causal network reconstruction from time-series with limited observations. With obtaining multi-source datasets based on the data-fusion strategy, we propose a novel method to handle nonlinearity and directionality of complex networked systems, namely group lasso nonlinear conditional granger causality. Specially, our method can exploit different sets of radial basis functions to approximate the nonlinear interactions between each pair of nodes and integrate sparsity into grouped variables selection. The performance characteristic of our approach is firstly assessed with two types of simulated datasets from nonlinear vector autoregressive model and nonlinear dynamic models, and then verified based on the benchmark datasets from DREAM3 Challenge4. Effects of data size and noise intensity are also discussed. All of the results demonstrate that the proposed method performs better in terms of higher area under precision-recall curve.
Collapse
Affiliation(s)
- Guanxue Yang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China
| | - Lin Wang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China
| | - Xiaofan Wang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China.
| |
Collapse
|
19
|
Moran B, Rahman A, Palonen K, Lanigan FT, Gallagher WM. Master Transcriptional Regulators in Cancer: Discovery via Reverse Engineering Approaches and Subsequent Validation. Cancer Res 2017; 77:2186-2190. [PMID: 28428271 DOI: 10.1158/0008-5472.can-16-1813] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Revised: 09/08/2016] [Accepted: 02/22/2017] [Indexed: 11/16/2022]
Abstract
Reverse engineering of transcriptional networks using gene expression data enables identification of genes that underpin the development and progression of different cancers. Methods to this end have been available for over a decade and, with a critical mass of transcriptomic data in the oncology arena having been reached, they are ever more applicable. Extensive and complex networks can be distilled into a small set of key master transcriptional regulators (MTR), genes that are very highly connected and have been shown to be involved in processes of known importance in disease. Interpreting and validating the results of standardized bioinformatic methods is of crucial importance in determining the inherent value of MTRs. In this review, we briefly describe how MTRs are identified and focus on providing an overview of how MTRs can and have been validated for use in clinical decision making in malignant diseases, along with serving as tractable therapeutic targets. Cancer Res; 77(9); 2186-90. ©2017 AACR.
Collapse
Affiliation(s)
- Bruce Moran
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland.,OncoMark Limited, NovaUCD, Belfield Innovation Park, Belfield, Dublin, Ireland
| | - Arman Rahman
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland.,OncoMark Limited, NovaUCD, Belfield Innovation Park, Belfield, Dublin, Ireland
| | - Katja Palonen
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland.,OncoMark Limited, NovaUCD, Belfield Innovation Park, Belfield, Dublin, Ireland
| | - Fiona T Lanigan
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland
| | - William M Gallagher
- Cancer Biology and Therapeutics Laboratory, UCD School of Biomolecular and Biomedical Research, UCD Conway Institute, University College Dublin, Dublin, Ireland. .,OncoMark Limited, NovaUCD, Belfield Innovation Park, Belfield, Dublin, Ireland
| |
Collapse
|
20
|
Henriques D, Villaverde AF, Rocha M, Saez-Rodriguez J, Banga JR. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput Biol 2017; 13:e1005379. [PMID: 28166222 PMCID: PMC5319798 DOI: 10.1371/journal.pcbi.1005379] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 02/21/2017] [Accepted: 01/24/2017] [Indexed: 11/19/2022] Open
Abstract
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM's ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Collapse
Affiliation(s)
- David Henriques
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| | - Alejandro F. Villaverde
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Julio Saez-Rodriguez
- Joint Research Center for Computational Biomedicine, RWTH-Aachen University, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Julio R. Banga
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| |
Collapse
|
21
|
Hu Y, Zhao H, Ai X. Inferring Weighted Directed Association Network from Multivariate Time Series with a Synthetic Method of Partial Symbolic Transfer Entropy Spectrum and Granger Causality. PLoS One 2016; 11:e0166084. [PMID: 27832153 PMCID: PMC5104482 DOI: 10.1371/journal.pone.0166084] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 10/21/2016] [Indexed: 11/18/2022] Open
Abstract
Complex network methodology is very useful for complex system explorer. However, the relationships among variables in complex system are usually not clear. Therefore, inferring association networks among variables from their observed data has been a popular research topic. We propose a synthetic method, named small-shuffle partial symbolic transfer entropy spectrum (SSPSTES), for inferring association network from multivariate time series. The method synthesizes surrogate data, partial symbolic transfer entropy (PSTE) and Granger causality. A proper threshold selection is crucial for common correlation identification methods and it is not easy for users. The proposed method can not only identify the strong correlation without selecting a threshold but also has the ability of correlation quantification, direction identification and temporal relation identification. The method can be divided into three layers, i.e. data layer, model layer and network layer. In the model layer, the method identifies all the possible pair-wise correlation. In the network layer, we introduce a filter algorithm to remove the indirect weak correlation and retain strong correlation. Finally, we build a weighted adjacency matrix, the value of each entry representing the correlation level between pair-wise variables, and then get the weighted directed association network. Two numerical simulated data from linear system and nonlinear system are illustrated to show the steps and performance of the proposed approach. The ability of the proposed method is approved by an application finally.
Collapse
Affiliation(s)
- Yanzhu Hu
- Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| | - Huiyang Zhao
- Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications, Beijing, 100876, China
- School of Information Engineering, Xuchang University, Xuchang, 461000, China
- * E-mail:
| | - Xinbo Ai
- Beijing Key Laboratory of Work Safety Intelligent Monitoring, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| |
Collapse
|
22
|
Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy. PLoS One 2016; 11:e0166115. [PMID: 27829000 PMCID: PMC5102470 DOI: 10.1371/journal.pone.0166115] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 10/24/2016] [Indexed: 12/18/2022] Open
Abstract
Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the "large p, small n" problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method.
Collapse
|
23
|
Inferring Weighted Directed Association Networks from Multivariate Time Series with the Small-Shuffle Symbolic Transfer Entropy Spectrum Method. ENTROPY 2016. [DOI: 10.3390/e18090328] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
24
|
Liu F, Zhang SW, Guo WF, Wei ZG, Chen L. Inference of Gene Regulatory Network Based on Local Bayesian Networks. PLoS Comput Biol 2016; 12:e1005024. [PMID: 27479082 PMCID: PMC4968793 DOI: 10.1371/journal.pcbi.1005024] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 06/20/2016] [Indexed: 11/18/2022] Open
Abstract
The inference of gene regulatory networks (GRNs) from expression data can mine the direct regulations among genes and gain deep insights into biological processes at a network level. During past decades, numerous computational approaches have been introduced for inferring the GRNs. However, many of them still suffer from various problems, e.g., Bayesian network (BN) methods cannot handle large-scale networks due to their high computational complexity, while information theory-based methods cannot identify the directions of regulatory interactions and also suffer from false positive/negative problems. To overcome the limitations, in this work we present a novel algorithm, namely local Bayesian network (LBN), to infer GRNs from gene expression data by using the network decomposition strategy and false-positive edge elimination scheme. Specifically, LBN algorithm first uses conditional mutual information (CMI) to construct an initial network or GRN, which is decomposed into a number of local networks or GRNs. Then, BN method is employed to generate a series of local BNs by selecting the k-nearest neighbors of each gene as its candidate regulatory genes, which significantly reduces the exponential search space from all possible GRN structures. Integrating these local BNs forms a tentative network or GRN by performing CMI, which reduces redundant regulations in the GRN and thus alleviates the false positive problem. The final network or GRN can be obtained by iteratively performing CMI and local BN on the tentative network. In the iterative process, the false or redundant regulations are gradually removed. When tested on the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in E.coli, our results suggest that LBN outperforms other state-of-the-art methods (ARACNE, GENIE3 and NARROMI) significantly, with more accurate and robust performance. In particular, the decomposition strategy with local Bayesian networks not only effectively reduce the computational cost of BN due to much smaller sizes of local GRNs, but also identify the directions of the regulations.
Collapse
Affiliation(s)
- Fei Liu
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Science, Baoji, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Wei-Feng Guo
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Ze-Gang Wei
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Luonan Chen
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| |
Collapse
|
25
|
Songhorzadeh M, Ansari-Asl K, Mahmoudi A. Inferring time-varying brain connectivity graph based on a new method for link estimation. NETWORK (BRISTOL, ENGLAND) 2016; 27:1-28. [PMID: 27136295 DOI: 10.3109/0954898x.2016.1173246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Causal interaction estimation among neuronal groups plays an important role in the assessment of brain functions. These directional relations can be best illustrated by means of graphical modeling which is a mathematical representation of a network. Here, we propose an efficient framework to derive a graphical model for the statistical analysis of multivariate processes from observed time series in a data-driven pipeline to explore the interregional brain interactions. A major part of this analysis is devoted to the graph link estimation, which is a measure capable of dealing with the multivariate analysis obstacles. In this paper, we use the Transfer Entropy (TE) measure and focus on its calculation that requires efficient estimation of high dimensional conditional probability distributions. Our method is based on the simplification of high dimensional parts of the conventional TE definition and especially devoted to the reduction of estimation dimension through searching for the most informative contents of the high dimensional parts. To this end, we exploit the causal Markov properties for time series graphs and prove that only a specified subset of involved variables plays an important role in multivariate TE estimation. We demonstrate the performance of our method for stationary processes using some numerical simulated examples as well as real neurophysiological data.
Collapse
Affiliation(s)
- Maryam Songhorzadeh
- a Department of Electrical Engineering, Faculty of Engineering , Shahid Chamran University of Ahvaz , Ahvaz , Iran
| | - Karim Ansari-Asl
- a Department of Electrical Engineering, Faculty of Engineering , Shahid Chamran University of Ahvaz , Ahvaz , Iran
| | - Alimorad Mahmoudi
- a Department of Electrical Engineering, Faculty of Engineering , Shahid Chamran University of Ahvaz , Ahvaz , Iran
| |
Collapse
|
26
|
Information theory in systems biology. Part I: Gene regulatory and metabolic networks. Semin Cell Dev Biol 2015; 51:3-13. [PMID: 26701126 DOI: 10.1016/j.semcdb.2015.12.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 12/07/2015] [Indexed: 11/22/2022]
Abstract
"A Mathematical Theory of Communication", was published in 1948 by Claude Shannon to establish a framework that is now known as information theory. In recent decades, information theory has gained much attention in the area of systems biology. The aim of this paper is to provide a systematic review of those contributions that have applied information theory in inferring or understanding of biological systems. Based on the type of system components and the interactions between them, we classify the biological systems into 4 main classes: gene regulatory, metabolic, protein-protein interaction and signaling networks. In the first part of this review, we attempt to introduce most of the existing studies on two types of biological networks, including gene regulatory and metabolic networks, which are founded on the concepts of information theory.
Collapse
|
27
|
Mousavian Z, Díaz J, Masoudi-Nejad A. Information theory in systems biology. Part II: protein-protein interaction and signaling networks. Semin Cell Dev Biol 2015; 51:14-23. [PMID: 26691180 DOI: 10.1016/j.semcdb.2015.12.006] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 12/07/2015] [Indexed: 12/25/2022]
Abstract
By the development of information theory in 1948 by Claude Shannon to address the problems in the field of data storage and data communication over (noisy) communication channel, it has been successfully applied in many other research areas such as bioinformatics and systems biology. In this manuscript, we attempt to review some of the existing literatures in systems biology, which are using the information theory measures in their calculations. As we have reviewed most of the existing information-theoretic methods in gene regulatory and metabolic networks in the first part of the review, so in the second part of our study, the application of information theory in other types of biological networks including protein-protein interaction and signaling networks will be surveyed.
Collapse
Affiliation(s)
- Zaynab Mousavian
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| | - José Díaz
- Grupo de Biología Teórica y Computacional, Centro de Investigación en Dinámica Celular, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, Mexico
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
28
|
Folch-Fortuny A, Villaverde AF, Ferrer A, Banga JR. Enabling network inference methods to handle missing data and outliers. BMC Bioinformatics 2015; 16:283. [PMID: 26335628 PMCID: PMC4559359 DOI: 10.1186/s12859-015-0717-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 08/24/2015] [Indexed: 12/20/2022] Open
Abstract
Background The inference of complex networks from data is a challenging problem in biological sciences, as well as in a wide range of disciplines such as chemistry, technology, economics, or sociology. The quantity and quality of the data greatly affect the results. While many methodologies have been developed for this task, they seldom take into account issues such as missing data or outlier detection and correction, which need to be properly addressed before network inference. Results Here we present an approach to (i) handle missing data and (ii) detect and correct outliers based on multivariate projection to latent structures. The method, called trimmed scores regression (TSR), enables network inference methods to analyse incomplete datasets by imputing the missing values coherently with the latent data structure. Furthermore, it substitutes the faulty values in a dataset by proper estimations. We provide an implementation of this approach, and show how it can be integrated with any network inference method as a preliminary data curation step. This functionality is demonstrated with a state of the art network inference method based on mutual information distance and entropy reduction, MIDER. Conclusion The methodology presented here enables network inference methods to analyse a large number of incomplete and faulty datasets that could not be reliably analysed so far. Our comparative studies show the superiority of TSR over other missing data approaches used by practitioners. Furthermore, the method allows for outlier detection and correction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0717-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Abel Folch-Fortuny
- Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Camino de Vera s/n, Valencia, 46022, Spain.
| | - Alejandro F Villaverde
- BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo, 36208, Spain.,Centre of Biological Engineering, Universidade do Minho, Campus de Gualtar, Braga, 4710-057, Portugal.,Department of Systems and Control Engineering, Universidade de Vigo, Rua Maxwell, Vigo, 36310, Spain
| | - Alberto Ferrer
- Departamento de Estadística e Investigación Operativa Aplicadas y Calidad, Universitat Politècnica de València, Camino de Vera s/n, Valencia, 46022, Spain
| | - Julio R Banga
- BioProcess Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo, 36208, Spain
| |
Collapse
|
29
|
Kickoff to conflict: a sequence analysis of intra-state conflict-preceding event structures. PLoS One 2015; 10:e0122472. [PMID: 25951105 PMCID: PMC4424002 DOI: 10.1371/journal.pone.0122472] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 02/12/2015] [Indexed: 11/19/2022] Open
Abstract
While many studies have suggested or assumed that the periods preceding the onset of intra-state conflict are similar across time and space, few have empirically tested this proposition. Using the Integrated Crisis Early Warning System's domestic event data in Asia from 1998-2010, we subject this proposition to empirical analysis. We code the similarity of government-rebel interactions in sequences preceding the onset of intra-state conflict to those preceding further periods of peace using three different metrics: Euclidean, Levenshtein, and mutual information. These scores are then used as predictors in a bivariate logistic regression to forecast whether we are likely to observe conflict in neither, one, or both of the states. We find that our model accurately classifies cases where both sequences precede peace, but struggles to distinguish between cases in which one sequence escalates to conflict and where both sequences escalate to conflict. These findings empirically suggest that generalizable patterns exist between event sequences that precede peace.
Collapse
|
30
|
Zhang Z, Zheng Z, Niu H, Mi Y, Wu S, Hu G. Solving the inverse problem of noise-driven dynamic networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:012814. [PMID: 25679664 DOI: 10.1103/physreve.91.012814] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Indexed: 06/04/2023]
Abstract
Nowadays, massive amounts of data are available for analysis in natural and social systems and the tasks to depict system structures from the data, i.e., the inverse problems, become one of the central issues in wide interdisciplinary fields. In this paper, we study the inverse problem of dynamic complex networks driven by white noise. A simple and universal inference formula of double correlation matrices and noise-decorrelation (DCMND) method is derived analytically, and numerical simulations confirm that the DCMND method can accurately depict both network structures and noise correlations by using available output data only. This inference performance has never been regarded possible by theoretical derivation, numerical computation, and experimental design.
Collapse
Affiliation(s)
- Zhaoyang Zhang
- Department of Physics, Beijing Normal University, Beijing 100875, China
| | - Zhigang Zheng
- Department of Physics, Beijing Normal University, Beijing 100875, China
| | - Haijing Niu
- State Key Laboratory of Cognitive Neuroscience and Learning and International Digital Group (IDG)/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China and Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University, Beijing 100875, China
| | - Yuanyuan Mi
- State Key Laboratory of Cognitive Neuroscience and Learning and International Digital Group (IDG)/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China and Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University, Beijing 100875, China
| | - Si Wu
- State Key Laboratory of Cognitive Neuroscience and Learning and International Digital Group (IDG)/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China and Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University, Beijing 100875, China
| | - Gang Hu
- Department of Physics, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
31
|
Villaverde AF, Ross J, Morán F, Banga JR. MIDER: network inference with mutual information distance and entropy reduction. PLoS One 2014; 9:e96732. [PMID: 24806471 PMCID: PMC4013075 DOI: 10.1371/journal.pone.0096732] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 04/09/2014] [Indexed: 01/14/2023] Open
Abstract
The prediction of links among variables from a given dataset is a task referred to as network inference or reverse engineering. It is an open problem in bioinformatics and systems biology, as well as in other areas of science. Information theory, which uses concepts such as mutual information, provides a rigorous framework for addressing it. While a number of information-theoretic methods are already available, most of them focus on a particular type of problem, introducing assumptions that limit their generality. Furthermore, many of these methods lack a publicly available implementation. Here we present MIDER, a method for inferring network structures with information theoretic concepts. It consists of two steps: first, it provides a representation of the network in which the distance among nodes indicates their statistical closeness. Second, it refines the prediction of the existing links to distinguish between direct and indirect interactions and to assign directionality. The method accepts as input time-series data related to some quantitative features of the network nodes (such as e.g. concentrations, if the nodes are chemical species). It takes into account time delays between variables, and allows choosing among several definitions and normalizations of mutual information. It is general purpose: it may be applied to any type of network, cellular or otherwise. A Matlab implementation including source code and data is freely available (http://www.iim.csic.es/~gingproc/mider.html). The performance of MIDER has been evaluated on seven different benchmark problems that cover the main types of cellular networks, including metabolic, gene regulatory, and signaling. Comparisons with state of the art information–theoretic methods have demonstrated the competitive performance of MIDER, as well as its versatility. Its use does not demand any a priori knowledge from the user; the default settings and the adaptive nature of the method provide good results for a wide range of problems without requiring tuning.
Collapse
Affiliation(s)
| | - John Ross
- Department of Chemistry, Stanford University, Stanford, California, United States of America
| | - Federico Morán
- Department of Biochemistry and Molecular Biology, Complutense University, Madrid, Spain
| | | |
Collapse
|
32
|
Villaverde AF, Banga JR. Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J R Soc Interface 2014; 11:20130505. [PMID: 24307566 PMCID: PMC3869153 DOI: 10.1098/rsif.2013.0505] [Citation(s) in RCA: 163] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/12/2013] [Indexed: 12/17/2022] Open
Abstract
The interplay of mathematical modelling with experiments is one of the central elements in systems biology. The aim of reverse engineering is to infer, analyse and understand, through this interplay, the functional and regulatory mechanisms of biological systems. Reverse engineering is not exclusive of systems biology and has been studied in different areas, such as inverse problem theory, machine learning, nonlinear physics, (bio)chemical kinetics, control theory and optimization, among others. However, it seems that many of these areas have been relatively closed to outsiders. In this contribution, we aim to compare and highlight the different perspectives and contributions from these fields, with emphasis on two key questions: (i) why are reverse engineering problems so hard to solve, and (ii) what methods are available for the particular problems arising from systems biology?
Collapse
Affiliation(s)
| | - Julio R. Banga
- BioProcess Engineering Group, IIM-CSIC, Spanish National Research Council, Vigo 36208, Spain
| |
Collapse
|