1
|
Lasri A, Shahrezaei V, Sturrock M. Benchmarking imputation methods for network inference using a novel method of synthetic scRNA-seq data generation. BMC Bioinformatics 2022; 23:236. [PMID: 35715748 PMCID: PMC9204969 DOI: 10.1186/s12859-022-04778-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 05/31/2022] [Indexed: 11/30/2022] Open
Abstract
Background Single cell RNA-sequencing (scRNA-seq) has very rapidly become the new workhorse of modern biology providing an unprecedented global view on cellular diversity and heterogeneity. In particular, the structure of gene-gene expression correlation contains information on the underlying gene regulatory networks. However, interpretation of scRNA-seq data is challenging due to specific experimental error and biases that are unique to this kind of data including drop-out (or technical zeros). Methods To deal with this problem several methods for imputation of zeros for scRNA-seq have been developed. However, it is not clear how these processing steps affect inference of genetic networks from single cell data. Here, we introduce Biomodelling.jl, a tool for generation of synthetic scRNA-seq data using multiscale modelling of stochastic gene regulatory networks in growing and dividing cells. Results Our tool produces realistic transcription data with a known ground truth network topology that can be used to benchmark different approaches for gene regulatory network inference. Using this tool we investigate the impact of different imputation methods on the performance of several network inference algorithms. Conclusions Biomodelling.jl provides a versatile and useful tool for future development and benchmarking of network inference approaches using scRNA-seq data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04778-9
Collapse
Affiliation(s)
- Ayoub Lasri
- Department of Physiology and Medical Physics, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Vahid Shahrezaei
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Marc Sturrock
- Department of Physiology and Medical Physics, Royal College of Surgeons in Ireland, Dublin, Ireland.
| |
Collapse
|
2
|
Angelin-Bonnet O, Biggs PJ, Vignes M. Gene Regulatory Networks: A Primer in Biological Processes and Statistical Modelling. Methods Mol Biol 2019; 1883:347-383. [PMID: 30547408 DOI: 10.1007/978-1-4939-8882-2_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Modelling gene regulatory networks requires not only a thorough understanding of the biological system depicted, but also the ability to accurately represent this system from a mathematical perspective. Throughout this chapter, we aim to familiarize the reader with the biological processes and molecular factors at play in the process of gene expression regulation. We first describe the different interactions controlling each step of the expression process, from transcription to mRNA and protein decay. In the second section, we provide statistical tools to accurately represent this biological complexity in the form of mathematical models. Among other considerations, we discuss the topological properties of biological networks, the application of deterministic and stochastic frameworks, and the quantitative modelling of regulation. We particularly focus on the use of such models for the simulation of expression data that can serve as a benchmark for the testing of network inference algorithms.
Collapse
Affiliation(s)
- Olivia Angelin-Bonnet
- Institute of Fundamental Sciences, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Patrick J Biggs
- Institute of Fundamental Sciences, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Matthieu Vignes
- Institute of Fundamental Sciences, Palmerston North, New Zealand.
- School of Veterinary Science, Massey University, Palmerston North, New Zealand.
| |
Collapse
|
3
|
Tripathi S, Lloyd-Price J, Ribeiro A, Yli-Harja O, Dehmer M, Emmert-Streib F. sgnesR: An R package for simulating gene expression data from an underlying real gene network structure considering delay parameters. BMC Bioinformatics 2017; 18:325. [PMID: 28676075 PMCID: PMC5496254 DOI: 10.1186/s12859-017-1731-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 06/15/2017] [Indexed: 01/04/2023] Open
Abstract
Background sgnesR (Stochastic Gene Network Expression Simulator in R) is an R package that provides an interface to simulate gene expression data from a given gene network using the stochastic simulation algorithm (SSA). The package allows various options for delay parameters and can easily included in reactions for promoter delay, RNA delay and Protein delay. A user can tune these parameters to model various types of reactions within a cell. As examples, we present two network models to generate expression profiles. We also demonstrated the inference of networks and the evaluation of association measure of edge and non-edge components from the generated expression profiles. Results The purpose of sgnesR is to enable an easy to use and a quick implementation for generating realistic gene expression data from biologically relevant networks that can be user selected. Conclusions sgnesR is freely available for academic use. The R package has been tested for R 3.2.0 under Linux, Windows and Mac OS X. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1731-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shailesh Tripathi
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Jason Lloyd-Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, USA.,Laboratory of Biosystem Dynamics, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Andre Ribeiro
- Laboratory of Biosystem Dynamics, Department of Signal Processing, Tampere University of Technology, Tampere, Finland.,Institute of Biosciences and Medical Technology, Tampere, Finland
| | - Olli Yli-Harja
- Institute of Biosciences and Medical Technology, Tampere, Finland.,Computational Systems Biology, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Matthias Dehmer
- Institute for Theoretical Informatics, Mathematics and Operations Research, Department of Computer Science, Universität der Bundeswehr München, Munich, Germany
| | - Frank Emmert-Streib
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland. .,Institute of Biosciences and Medical Technology, Tampere, Finland.
| |
Collapse
|
4
|
Coker EA, Mitsopoulos C, Workman P, Al-Lazikani B. SiGNet: A signaling network data simulator to enable signaling network inference. PLoS One 2017; 12:e0177701. [PMID: 28545060 PMCID: PMC5435248 DOI: 10.1371/journal.pone.0177701] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 05/02/2017] [Indexed: 12/22/2022] Open
Abstract
Network models are widely used to describe complex signaling systems. Cellular wiring varies in different cellular contexts and numerous inference techniques have been developed to infer the structure of a network from experimental data of the network's behavior. To objectively identify which inference strategy is best suited to a specific network, a gold standard network and dataset are required. However, suitable datasets for benchmarking are difficult to find. Numerous tools exist that can simulate data for transcriptional networks, but these are of limited use for the study of signaling networks. Here, we describe SiGNet (Signal Generator for Networks): a Cytoscape app that simulates experimental data for a signaling network of known structure. SiGNet has been developed and tested against published experimental data, incorporating information on network architecture, and the directionality and strength of interactions to create biological data in silico. SiGNet is the first tool to simulate biological signaling data, enabling an accurate and systematic assessment of inference strategies. SiGNet can also be used to produce preliminary models of key biological pathways following perturbation.
Collapse
Affiliation(s)
- Elizabeth A. Coker
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Costas Mitsopoulos
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Paul Workman
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| | - Bissan Al-Lazikani
- Cancer Research UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
| |
Collapse
|
5
|
Ni Y, Aghamirzaie D, Elmarakeby H, Collakova E, Li S, Grene R, Heath LS. A Machine Learning Approach to Predict Gene Regulatory Networks in Seed Development in Arabidopsis. FRONTIERS IN PLANT SCIENCE 2016; 7:1936. [PMID: 28066488 PMCID: PMC5179539 DOI: 10.3389/fpls.2016.01936] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 12/06/2016] [Indexed: 05/29/2023]
Abstract
Gene regulatory networks (GRNs) provide a representation of relationships between regulators and their target genes. Several methods for GRN inference, both unsupervised and supervised, have been developed to date. Because regulatory relationships consistently reprogram in diverse tissues or under different conditions, GRNs inferred without specific biological contexts are of limited applicability. In this report, a machine learning approach is presented to predict GRNs specific to developing Arabidopsis thaliana embryos. We developed the Beacon GRN inference tool to predict GRNs occurring during seed development in Arabidopsis based on a support vector machine (SVM) model. We developed both global and local inference models and compared their performance, demonstrating that local models are generally superior for our application. Using both the expression levels of the genes expressed in developing embryos and prior known regulatory relationships, GRNs were predicted for specific embryonic developmental stages. The targets that are strongly positively correlated with their regulators are mostly expressed at the beginning of seed development. Potential direct targets were identified based on a match between the promoter regions of these inferred targets and the cis elements recognized by specific regulators. Our analysis also provides evidence for previously unknown inhibitory effects of three positive regulators of gene expression. The Beacon GRN inference tool provides a valuable model system for context-specific GRN inference and is freely available at https://github.com/BeaconProjectAtVirginiaTech/beacon_network_inference.git.
Collapse
Affiliation(s)
- Ying Ni
- Department of Computer Science, Virginia Polytechnic Institute and State UniversityBlacksburg, VA, USA
| | - Delasa Aghamirzaie
- Genetics, Bioinformatics and Computational Biology, Virginia Polytechnic Institute and State UniversityBlacksburg, VA, USA
| | - Haitham Elmarakeby
- Department of Computer Science, Virginia Polytechnic Institute and State UniversityBlacksburg, VA, USA
| | - Eva Collakova
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Polytechnic Institute and State UniversityBlacksburg, VA, USA
| | - Song Li
- Department of Crop and Soil Environmental Sciences, Virginia Polytechnic Institute and State UniversityBlacksburg, VA, USA
| | - Ruth Grene
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Polytechnic Institute and State UniversityBlacksburg, VA, USA
| | - Lenwood S. Heath
- Department of Computer Science, Virginia Polytechnic Institute and State UniversityBlacksburg, VA, USA
| |
Collapse
|
6
|
Brent MR. Past Roadblocks and New Opportunities in Transcription Factor Network Mapping. Trends Genet 2016; 32:736-750. [PMID: 27720190 DOI: 10.1016/j.tig.2016.08.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2016] [Revised: 08/12/2016] [Accepted: 08/16/2016] [Indexed: 12/11/2022]
Abstract
One of the principal mechanisms by which cells differentiate and respond to changes in external signals or conditions is by changing the activity levels of transcription factors (TFs). This changes the transcription rates of target genes via the cell's TF network, which ultimately contributes to reconfiguring cellular state. Since microarrays provided our first window into global cellular state, computational biologists have eagerly attacked the problem of mapping TF networks, a key part of the cell's control circuitry. In retrospect, however, steady-state mRNA abundance levels were a poor substitute for TF activity levels and gene transcription rates. Likewise, mapping TF binding through chromatin immunoprecipitation proved less predictive of functional regulation and less amenable to systematic elucidation of complete networks than originally hoped. This review explains these roadblocks and the current, unprecedented blossoming of new experimental techniques built on second-generation sequencing, which hold out the promise of rapid progress in TF network mapping.
Collapse
Affiliation(s)
- Michael R Brent
- Departments of Computer Science and Genetics and Center for Genome Sciences and Systems Biology, Washington University, , Saint Louis, MO, USA.
| |
Collapse
|
7
|
BioPreDyn-bench: a suite of benchmark problems for dynamic modelling in systems biology. BMC SYSTEMS BIOLOGY 2015; 9:8. [PMID: 25880925 PMCID: PMC4342829 DOI: 10.1186/s12918-015-0144-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Accepted: 01/15/2015] [Indexed: 11/21/2022]
Abstract
Background Dynamic modelling is one of the cornerstones of systems biology. Many research efforts are currently being invested in the development and exploitation of large-scale kinetic models. The associated problems of parameter estimation (model calibration) and optimal experimental design are particularly challenging. The community has already developed many methods and software packages which aim to facilitate these tasks. However, there is a lack of suitable benchmark problems which allow a fair and systematic evaluation and comparison of these contributions. Results Here we present BioPreDyn-bench, a set of challenging parameter estimation problems which aspire to serve as reference test cases in this area. This set comprises six problems including medium and large-scale kinetic models of the bacterium E. coli, baker’s yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The level of description includes metabolism, transcription, signal transduction, and development. For each problem we provide (i) a basic description and formulation, (ii) implementations ready-to-run in several formats, (iii) computational results obtained with specific solvers, (iv) a basic analysis and interpretation. Conclusions This suite of benchmark problems can be readily used to evaluate and compare parameter estimation methods. Further, it can also be used to build test problems for sensitivity and identifiability analysis, model reduction and optimal experimental design methods. The suite, including codes and documentation, can be freely downloaded from the BioPreDyn-bench website, https://sites.google.com/site/biopredynbenchmarks/. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0144-4) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Sławek J, Arodź T. ENNET: inferring large gene regulatory networks from expression data using gradient boosting. BMC SYSTEMS BIOLOGY 2013; 7:106. [PMID: 24148309 PMCID: PMC4015806 DOI: 10.1186/1752-0509-7-106] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Accepted: 10/17/2013] [Indexed: 01/19/2023]
Abstract
BACKGROUND The regulation of gene expression by transcription factors is a key determinant of cellular phenotypes. Deciphering genome-wide networks that capture which transcription factors regulate which genes is one of the major efforts towards understanding and accurate modeling of living systems. However, reverse-engineering the network from gene expression profiles remains a challenge, because the data are noisy, high dimensional and sparse, and the regulation is often obscured by indirect connections. RESULTS We introduce a gene regulatory network inference algorithm ENNET, which reverse-engineers networks of transcriptional regulation from a variety of expression profiles with a superior accuracy compared to the state-of-the-art methods. The proposed method relies on the boosting of regression stumps combined with a relative variable importance measure for the initial scoring of transcription factors with respect to each gene. Then, we propose a technique for using a distribution of the initial scores and information about knockouts to refine the predictions. We evaluated the proposed method on the DREAM3, DREAM4 and DREAM5 data sets and achieved higher accuracy than the winners of those competitions and other established methods. CONCLUSIONS Superior accuracy achieved on the three different benchmark data sets shows that ENNET is a top contender in the task of network inference. It is a versatile method that uses information about which gene was knocked-out in which experiment if it is available, but remains the top performer even without such information. ENNET is available for download from https://github.com/slawekj/ennet under the GNU GPLv3 license.
Collapse
Affiliation(s)
- Janusz Sławek
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| | - Tomasz Arodź
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| |
Collapse
|
9
|
|
10
|
Maetschke SR, Madhamshettiwar PB, Davis MJ, Ragan MA. Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Brief Bioinform 2013; 15:195-211. [PMID: 23698722 PMCID: PMC3956069 DOI: 10.1093/bib/bbt034] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Inference of gene regulatory network from expression data is a challenging task. Many methods have been developed to this purpose but a comprehensive evaluation that covers unsupervised, semi-supervised and supervised methods, and provides guidelines for their practical application, is lacking. We performed an extensive evaluation of inference methods on simulated and experimental expression data. The results reveal low prediction accuracies for unsupervised techniques with the notable exception of the Z-SCORE method on knockout data. In all other cases, the supervised approach achieved the highest accuracies and even in a semi-supervised setting with small numbers of only positive samples, outperformed the unsupervised techniques.
Collapse
Affiliation(s)
- Stefan R Maetschke
- Institute for Molecular Bioscience and ARC Centre of Excellence in Bioinformatics, Brisbane, QLD 4072, Australia, Tel.: 61 7 3346 2616; Fax: 61 7 3346 2101;
| | | | | | | |
Collapse
|
11
|
Camargo-Rodriguez AV, Kim JT. DoGeNetS: using optimisation to discriminate regulatory network topologies based on gene expression data. IET Syst Biol 2012; 6:1-8. [PMID: 22360266 DOI: 10.1049/iet-syb.2011.0004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Gene regulatory networks (GRNs) determine the dynamics of gene expression. Interest often focuses on the topological structure of a GRN while numerical parameters (e.g. decay rates) are unknown and less important. For larger GRNs, inference of structure from gene expression data is prohibitively difficult. Models are often proposed based on integrative interpretation of multiple sources of information. We have developed DoGeNetS (Discrimination of Gene Network Structures), a method to directly assess candidate models of GRN structure against a target gene expression data set. The transsys language serves to model GRN structures. Numeric parameters are optimised to approximate the target data. Multiple restarts of optimisation yield score sets that provide a basis to statistically discriminate candidate models according to their potential to explain the target data. We demonstrate discrimination power of the DoGeNetS method by relating structural divergence to divergence between gene expression data sets. Known models are used to generate target expression data, and a set of candidate models with a defined structural divergence to the true model is produced. Structural divergence and divergence of expression profiles after optimisation are strongly correlated. We further show that discrimination is possible at noise levels exceeding those typical of contemporary microarray data. DoGeNetS is capable of discriminating the best GRN structure from among a small number of candidates. p values indicate whether differences in divergence of expression are significant. Although this study uses single gene knockouts, the DoGeNetS method can be adapted to simulate a virtually unlimited range of experimental conditions. [Includes supplementary material].
Collapse
|
12
|
Qiu P, Plevritis SK. Reconstructing directed signed gene regulatory network from microarray data. IEEE Trans Biomed Eng 2011; 58:3518-21. [PMID: 21803675 DOI: 10.1109/tbme.2011.2163188] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Great efforts have been made to develop both algorithms that reconstruct gene regulatory networks and systems that simulate gene networks and expression data, for the purpose of benchmarking network reconstruction algorithms. An interesting observation is that although many simulation systems chose to use Hill kinetics to generate data, none of the reconstruction algorithms were developed based on the Hill kinetics. One possible explanation is that, in Hill kinetics, activation and inhibition interactions take different mathematical forms, which brings additional combinatorial complexity into the reconstruction problem. We propose a new model that qualitatively behaves similar to the Hill kinetics, but has the same mathematical form for both activation and inhibition. We developed an algorithm to reconstruct gene networks based on this new model. Simulation results suggested a novel biological hypothesis that in gene knockout experiments, repressing protein synthesis to a certain extent may lead to better expression data and higher network reconstruction accuracy.
Collapse
Affiliation(s)
- Peng Qiu
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
| | | |
Collapse
|
13
|
Schaffter T, Marbach D, Floreano D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 2011; 27:2263-70. [PMID: 21697125 DOI: 10.1093/bioinformatics/btr373] [Citation(s) in RCA: 294] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Thomas Schaffter
- Laboratory of Intelligent Systems, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | | |
Collapse
|
14
|
Lopes FM, Cesar RM, Costa LDF. Gene expression complex networks: synthesis, identification, and analysis. J Comput Biol 2011; 18:1353-67. [PMID: 21548810 DOI: 10.1089/cmb.2010.0118] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree <k> variation, decreasing its network recovery rate with the increase of <k>. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.
Collapse
Affiliation(s)
- Fabrício M Lopes
- Federal University of Technology-Paraná and Institute of Mathematics and Statistics, University of São Paulo, Brazil.
| | | | | |
Collapse
|
15
|
Pritchard L, Birch P. A systems biology perspective on plant-microbe interactions: biochemical and structural targets of pathogen effectors. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2011; 180:584-603. [PMID: 21421407 DOI: 10.1016/j.plantsci.2010.12.008] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Revised: 12/13/2010] [Accepted: 12/15/2010] [Indexed: 05/22/2023]
Abstract
Plants have biochemical defences against stresses from predators, parasites and pathogens. In this review we discuss the interaction of plant defences with microbial pathogens such as bacteria, fungi and oomycetes, and viruses. We examine principles of complex dynamic networks that allow identification of network components that are differentially and predictably sensitive to perturbation, thus making them likely effector targets. We relate these principles to recent developments in our understanding of known effector targets in plant-pathogen systems, and propose a systems-level framework for the interpretation and modelling of host-microbe interactions mediated by effectors. We describe this framework briefly, and conclude by discussing useful experimental approaches for populating this framework.
Collapse
Affiliation(s)
- Leighton Pritchard
- Plant Pathology Programme, SCRI, Errol Road, Invergowrie, Dundee, Scotland DD25DA, UK.
| | | |
Collapse
|
16
|
Greenfield A, Madar A, Ostrer H, Bonneau R. DREAM4: Combining genetic and dynamic information to identify biological networks and dynamical models. PLoS One 2010; 5:e13397. [PMID: 21049040 PMCID: PMC2963605 DOI: 10.1371/journal.pone.0013397] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2010] [Accepted: 09/09/2010] [Indexed: 11/23/2022] Open
Abstract
Background Current technologies have lead to the availability of multiple genomic data types in sufficient quantity and quality to serve as a basis for automatic global network inference. Accordingly, there are currently a large variety of network inference methods that learn regulatory networks to varying degrees of detail. These methods have different strengths and weaknesses and thus can be complementary. However, combining different methods in a mutually reinforcing manner remains a challenge. Methodology We investigate how three scalable methods can be combined into a useful network inference pipeline. The first is a novel t-test–based method that relies on a comprehensive steady-state knock-out dataset to rank regulatory interactions. The remaining two are previously published mutual information and ordinary differential equation based methods (tlCLR and Inferelator 1.0, respectively) that use both time-series and steady-state data to rank regulatory interactions; the latter has the added advantage of also inferring dynamic models of gene regulation which can be used to predict the system's response to new perturbations. Conclusion/Significance Our t-test based method proved powerful at ranking regulatory interactions, tying for first out of methods in the DREAM4 100-gene in-silico network inference challenge. We demonstrate complementarity between this method and the two methods that take advantage of time-series data by combining the three into a pipeline whose ability to rank regulatory interactions is markedly improved compared to either method alone. Moreover, the pipeline is able to accurately predict the response of the system to new conditions (in this case new double knock-out genetic perturbations). Our evaluation of the performance of multiple methods for network inference suggests avenues for future methods development and provides simple considerations for genomic experimental design. Our code is publicly available at http://err.bio.nyu.edu/inferelator/.
Collapse
Affiliation(s)
- Alex Greenfield
- Computational Biology Program, New York University Sackler School of Medicine, New York, New York, United States of America
| | - Aviv Madar
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Harry Ostrer
- Human Genetics Program, Department of Pediatrics, New York University Langone Medical Center, New York, New York, United States of America
| | - Richard Bonneau
- Computational Biology Program, New York University Sackler School of Medicine, New York, New York, United States of America
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
17
|
Castro-Melchor M, Charaniya S, Karypis G, Takano E, Hu WS. Genome-wide inference of regulatory networks in Streptomyces coelicolor. BMC Genomics 2010; 11:578. [PMID: 20955611 PMCID: PMC3224704 DOI: 10.1186/1471-2164-11-578] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Accepted: 10/18/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The onset of antibiotics production in Streptomyces species is co-ordinated with differentiation events. An understanding of the genetic circuits that regulate these coupled biological phenomena is essential to discover and engineer the pharmacologically important natural products made by these species. The availability of genomic tools and access to a large warehouse of transcriptome data for the model organism, Streptomyces coelicolor, provides incentive to decipher the intricacies of the regulatory cascades and develop biologically meaningful hypotheses. RESULTS In this study, more than 500 samples of genome-wide temporal transcriptome data, comprising wild-type and more than 25 regulatory gene mutants of Streptomyces coelicolor probed across multiple stress and medium conditions, were investigated. Information based on transcript and functional similarity was used to update a previously-predicted whole-genome operon map and further applied to predict transcriptional networks constituting modules enriched in diverse functions such as secondary metabolism, and sigma factor. The predicted network displays a scale-free architecture with a small-world property observed in many biological networks. The networks were further investigated to identify functionally-relevant modules that exhibit functional coherence and a consensus motif in the promoter elements indicative of DNA-binding elements. CONCLUSIONS Despite the enormous experimental as well as computational challenges, a systems approach for integrating diverse genome-scale datasets to elucidate complex regulatory networks is beginning to emerge. We present an integrated analysis of transcriptome data and genomic features to refine a whole-genome operon map and to construct regulatory networks at the cistron level in Streptomyces coelicolor. The functionally-relevant modules identified in this study pose as potential targets for further studies and verification.
Collapse
Affiliation(s)
- Marlene Castro-Melchor
- Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Avenue SE, Minneapolis, MN 55455, USA
| | | | | | | | | |
Collapse
|
18
|
Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS One 2010; 5:e9202. [PMID: 20186320 PMCID: PMC2826397 DOI: 10.1371/journal.pone.0009202] [Citation(s) in RCA: 298] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2009] [Accepted: 01/19/2010] [Indexed: 11/29/2022] Open
Abstract
Background Systems biology has embraced computational modeling in response to the quantitative nature and increasing scale of contemporary data sets. The onslaught of data is accelerating as molecular profiling technology evolves. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) is a community effort to catalyze discussion about the design, application, and assessment of systems biology models through annual reverse-engineering challenges. Methodology and Principal Findings We describe our assessments of the four challenges associated with the third DREAM conference which came to be known as the DREAM3 challenges: signaling cascade identification, signaling response prediction, gene expression prediction, and the DREAM3 in silico network challenge. The challenges, based on anonymized data sets, tested participants in network inference and prediction of measurements. Forty teams submitted 413 predicted networks and measurement test sets. Overall, a handful of best-performer teams were identified, while a majority of teams made predictions that were equivalent to random. Counterintuitively, combining the predictions of multiple teams (including the weaker teams) can in some cases improve predictive power beyond that of any single method. Conclusions DREAM provides valuable feedback to practitioners of systems biology modeling. Lessons learned from the predictions of the community provide much-needed context for interpreting claims of efficacy of algorithms described in the scientific literature.
Collapse
|
19
|
Anvar SY, 't Hoen PAC, Tucker A. The identification of informative genes from multiple datasets with increasing complexity. BMC Bioinformatics 2010; 11:32. [PMID: 20078860 PMCID: PMC2822754 DOI: 10.1186/1471-2105-11-32] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 01/15/2010] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes. RESULTS In this paper, we identify the most appropriate model complexity using cross-validation and independent test set validation for predicting gene expression in three published datasets related to myogenesis and muscle differentiation. Furthermore, we demonstrate that models trained on simpler datasets can be used to identify interactions among genes and select the most informative. We also show that these models can explain the myogenesis-related genes (genes of interest) significantly better than others (P < 0.004) since the improvement in their rankings is much more pronounced. Finally, after further evaluating our results on synthetic datasets, we show that our approach outperforms a concordance method by Lai et al. in identifying informative genes from multiple datasets with increasing complexity whilst additionally modelling the interaction between genes. CONCLUSIONS We show that Bayesian networks derived from simpler controlled systems have better performance than those trained on datasets from more complex biological systems. Further, we present that highly predictive and consistent genes, from the pool of differentially expressed genes, across independent datasets are more likely to be fundamentally involved in the biological process under study. We conclude that networks trained on simpler controlled systems, such as in vitro experiments, can be used to model and capture interactions among genes in more complex datasets, such as in vivo experiments, where these interactions would otherwise be concealed by a multitude of other ongoing events.
Collapse
Affiliation(s)
- S Yahya Anvar
- Center for Intelligent Data Analysis, School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK
- Center for Human and Clinical Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Peter AC 't Hoen
- Center for Human and Clinical Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Allan Tucker
- Center for Intelligent Data Analysis, School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK
| |
Collapse
|
20
|
He F, Balling R, Zeng AP. Reverse engineering and verification of gene networks: principles, assumptions, and limitations of present methods and future perspectives. J Biotechnol 2009; 144:190-203. [PMID: 19631244 DOI: 10.1016/j.jbiotec.2009.07.013] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Revised: 07/13/2009] [Accepted: 07/16/2009] [Indexed: 12/21/2022]
Abstract
Reverse engineering of gene networks aims at revealing the structure of the gene regulation network in a biological system by reasoning backward directly from experimental data. Many methods have recently been proposed for reverse engineering of gene networks by using gene transcript expression data measured by microarray. Whereas the potentials of the methods have been well demonstrated, the assumptions and limitations behind them are often not clearly stated or not well understood. In this review, we first briefly explain the principles of the major methods, identify the assumptions behind them and pinpoint the limitations and possible pitfalls in applying them to real biological questions. With regard to applications, we then discuss challenges in the experimental verification of gene networks generated from reverse engineering methods. We further propose an optimal experimental design for allocating sampling schedule and possible strategies for reducing the limitations of some of the current reverse engineering methods. Finally, we examine the perspectives for the development of reverse engineering and urge the need to move from revealing network structure to the dynamics of biological systems.
Collapse
Affiliation(s)
- Feng He
- Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany
| | | | | |
Collapse
|