1
|
Ghahramani N, Shodja J, Rafat SA, Panahi B, Hasanpur K. Integrative Systems Biology Analysis Elucidates Mastitis Disease Underlying Functional Modules in Dairy Cattle. Front Genet 2021; 12:712306. [PMID: 34691146 PMCID: PMC8531812 DOI: 10.3389/fgene.2021.712306] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Background: Mastitis is the most prevalent disease in dairy cattle and one of the most significant bovine pathologies affecting milk production, animal health, and reproduction. In addition, mastitis is the most common, expensive, and contagious infection in the dairy industry. Methods: A meta-analysis of microarray and RNA-seq data was conducted to identify candidate genes and functional modules associated with mastitis disease. The results were then applied to systems biology analysis via weighted gene coexpression network analysis (WGCNA), Gene Ontology, enrichment analysis for the Kyoto Encyclopedia of Genes and Genomes (KEGG), and modeling using machine-learning algorithms. Results: Microarray and RNA-seq datasets were generated for 2,089 and 2,794 meta-genes, respectively. Between microarray and RNA-seq datasets, a total of 360 meta-genes were found that were significantly enriched as "peroxisome," "NOD-like receptor signaling pathway," "IL-17 signaling pathway," and "TNF signaling pathway" KEGG pathways. The turquoise module (n = 214 genes) and the brown module (n = 57 genes) were identified as critical functional modules associated with mastitis through WGCNA. PRDX5, RAB5C, ACTN4, SLC25A16, MAPK6, CD53, NCKAP1L, ARHGEF2, COL9A1, and PTPRC genes were detected as hub genes in identified functional modules. Finally, using attribute weighting and machine-learning methods, hub genes that are sufficiently informative in Escherichia coli mastitis were used to optimize predictive models. The constructed model proposed the optimal approach for the meta-genes and validated several high-ranked genes as biomarkers for E. coli mastitis using the decision tree (DT) method. Conclusion: The candidate genes and pathways proposed in this study may shed new light on the underlying molecular mechanisms of mastitis disease and suggest new approaches for diagnosing and treating E. coli mastitis in dairy cattle.
Collapse
Affiliation(s)
- Nooshin Ghahramani
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Jalil Shodja
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Seyed Abbas Rafat
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Karim Hasanpur
- Department of Animal Science, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| |
Collapse
|
2
|
Kwon MS, Lee BT, Lee SY, Kim HU. Modeling regulatory networks using machine learning for systems metabolic engineering. Curr Opin Biotechnol 2020; 65:163-170. [DOI: 10.1016/j.copbio.2020.02.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 12/18/2022]
|
3
|
Kravchenko-Balasha N. Translating Cancer Molecular Variability into Personalized Information Using Bulk and Single Cell Approaches. Proteomics 2020; 20:e1900227. [PMID: 32072740 DOI: 10.1002/pmic.201900227] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 01/13/2020] [Indexed: 12/17/2022]
Abstract
Cancer research is striving toward new frontiers of assigning the correct personalized drug(s) to a given patient. However, extensive tumor heterogeneity poses a major obstacle. Tumors of the same type often respond differently to therapy, due to patient-specific molecular aberrations and/or untargeted tumor subpopulations. It is frequently not possible to determine a priori which patients will respond to a certain therapy or how an efficient patient-specific combined therapy should be designed. Large-scale datasets have been growing at an accelerated pace and various technologies and analytical tools for single cell and bulk level analyses are being developed to extract significant individualized signals from such heterogeneous data. However, personalized therapies that dramatically alter the course of the disease remain scarce, and most tumors still respond poorly to medical care. In this review, the basic concepts of bulk and single cell approaches are discussed, as well as their emerging role in individualized designs of drug therapies, including the advantages and limitations of their applications in personalized medicine.
Collapse
Affiliation(s)
- Nataly Kravchenko-Balasha
- Department for Bio-Medical Research, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem, 91120, Israel
| |
Collapse
|
4
|
|
5
|
Reconstructing Genetic Regulatory Networks Using Two-Step Algorithms with the Differential Equation Models of Neural Networks. Interdiscip Sci 2017; 10:823-835. [PMID: 28748400 DOI: 10.1007/s12539-017-0254-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Revised: 07/01/2017] [Accepted: 07/14/2017] [Indexed: 10/19/2022]
Abstract
BACKGROUND The identification of genetic regulatory networks (GRNs) provides insights into complex cellular processes. A class of recurrent neural networks (RNNs) captures the dynamics of GRN. Algorithms combining the RNN and machine learning schemes were proposed to reconstruct small-scale GRNs using gene expression time series. RESULTS We present new GRN reconstruction methods with neural networks. The RNN is extended to a class of recurrent multilayer perceptrons (RMLPs) with latent nodes. Our methods contain two steps: the edge rank assignment step and the network construction step. The former assigns ranks to all possible edges by a recursive procedure based on the estimated weights of wires of RNN/RMLP (RERNN/RERMLP), and the latter constructs a network consisting of top-ranked edges under which the optimized RNN simulates the gene expression time series. The particle swarm optimization (PSO) is applied to optimize the parameters of RNNs and RMLPs in a two-step algorithm. The proposed RERNN-RNN and RERMLP-RNN algorithms are tested on synthetic and experimental gene expression time series of small GRNs of about 10 genes. The experimental time series are from the studies of yeast cell cycle regulated genes and E. coli DNA repair genes. CONCLUSION The unstable estimation of RNN using experimental time series having limited data points can lead to fairly arbitrary predicted GRNs. Our methods incorporate RNN and RMLP into a two-step structure learning procedure. Results show that the RERMLP using the RMLP with a suitable number of latent nodes to reduce the parameter dimension often result in more accurate edge ranks than the RERNN using the regularized RNN on short simulated time series. Combining by a weighted majority voting rule the networks derived by the RERMLP-RNN using different numbers of latent nodes in step one to infer the GRN, the method performs consistently and outperforms published algorithms for GRN reconstruction on most benchmark time series. The framework of two-step algorithms can potentially incorporate with different nonlinear differential equation models to reconstruct the GRN.
Collapse
|
6
|
Inferring causal networks using fuzzy cognitive maps and evolutionary algorithms with application to gene regulatory network reconstruction. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.08.039] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Inferring Broad Regulatory Biology from Time Course Data: Have We Reached an Upper Bound under Constraints Typical of In Vivo Studies? PLoS One 2015; 10:e0127364. [PMID: 25984725 PMCID: PMC4435750 DOI: 10.1371/journal.pone.0127364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/13/2015] [Indexed: 12/21/2022] Open
Abstract
There is a growing appreciation for the network biology that regulates the coordinated expression of molecular and cellular markers however questions persist regarding the identifiability of these networks. Here we explore some of the issues relevant to recovering directed regulatory networks from time course data collected under experimental constraints typical of in vivo studies. NetSim simulations of sparsely connected biological networks were used to evaluate two simple feature selection techniques used in the construction of linear Ordinary Differential Equation (ODE) models, namely truncation of terms versus latent vector projection. Performance was compared with ODE-based Time Series Network Identification (TSNI) integral, and the information-theoretic Time-Delay ARACNE (TD-ARACNE). Projection-based techniques and TSNI integral outperformed truncation-based selection and TD-ARACNE on aggregate networks with edge densities of 10-30%, i.e. transcription factor, protein-protein cliques and immune signaling networks. All were more robust to noise than truncation-based feature selection. Performance was comparable on the in silico 10-node DREAM 3 network, a 5-node Yeast synthetic network designed for In vivo Reverse-engineering and Modeling Assessment (IRMA) and a 9-node human HeLa cell cycle network of similar size and edge density. Performance was more sensitive to the number of time courses than to sample frequency and extrapolated better to larger networks by grouping experiments. In all cases performance declined rapidly in larger networks with lower edge density. Limited recovery and high false positive rates obtained overall bring into question our ability to generate informative time course data rather than the design of any particular reverse engineering algorithm.
Collapse
|
8
|
Linde J, Schulze S, Henkel SG, Guthke R. Data- and knowledge-based modeling of gene regulatory networks: an update. EXCLI JOURNAL 2015; 14:346-78. [PMID: 27047314 PMCID: PMC4817425 DOI: 10.17179/excli2015-168] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 02/01/2023]
Abstract
Gene regulatory network inference is a systems biology approach which predicts interactions between genes with the help of high-throughput data. In this review, we present current and updated network inference methods focusing on novel techniques for data acquisition, network inference assessment, network inference for interacting species and the integration of prior knowledge. After the advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) we discuss in detail its application to network inference. Furthermore, we present progress for large-scale or even full-genomic network inference as well as for small-scale condensed network inference and review advances in the evaluation of network inference methods by crowdsourcing. Finally, we reflect the current availability of data and prior knowledge sources and give an outlook for the inference of gene regulatory networks that reflect interacting species, in particular pathogen-host interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Sylvie Schulze
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | | | - Reinhard Guthke
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|
9
|
Emmert-Streib F, Dehmer M, Haibe-Kains B. Untangling statistical and biological models to understand network inference: the need for a genomics network ontology. Front Genet 2014; 5:299. [PMID: 25221572 PMCID: PMC4148777 DOI: 10.3389/fgene.2014.00299] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Accepted: 08/12/2014] [Indexed: 12/31/2022] Open
Abstract
In this paper, we shed light on approaches that are currently used to infer networks from gene expression data with respect to their biological meaning. As we will show, the biological interpretation of these networks depends on the chosen theoretical perspective. For this reason, we distinguish a statistical perspective from a mathematical modeling perspective and elaborate their differences and implications. Our results indicate the imperative need for a genomic network ontology in order to avoid increasing confusion about the biological interpretation of inferred networks, which can be even enhanced by approaches that integrate multiple data sets, respectively, data types.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Faculty of Medicine, Health and Life Sciences, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast Belfast, UK
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT Hall in Tyrol, Austria
| | - Benjamin Haibe-Kains
- Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Centre, University Health Network Toronto, ON, Canada
| |
Collapse
|
10
|
Kupfer P, Huber R, Weber M, Vlaic S, Häupl T, Koczan D, Guthke R, Kinne RW. Novel application of multi-stimuli network inference to synovial fibroblasts of rheumatoid arthritis patients. BMC Med Genomics 2014; 7:40. [PMID: 24989895 PMCID: PMC4099018 DOI: 10.1186/1755-8794-7-40] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 06/25/2014] [Indexed: 11/19/2022] Open
Abstract
Background Network inference of gene expression data is an important challenge in systems biology. Novel algorithms may provide more detailed gene regulatory networks (GRN) for complex, chronic inflammatory diseases such as rheumatoid arthritis (RA), in which activated synovial fibroblasts (SFBs) play a major role. Since the detailed mechanisms underlying this activation are still unclear, simultaneous investigation of multi-stimuli activation of SFBs offers the possibility to elucidate the regulatory effects of multiple mediators and to gain new insights into disease pathogenesis. Methods A GRN was therefore inferred from RA-SFBs treated with 4 different stimuli (IL-1 β, TNF- α, TGF- β, and PDGF-D). Data from time series microarray experiments (0, 1, 2, 4, 12 h; Affymetrix HG-U133 Plus 2.0) were batch-corrected applying ‘ComBat’, analyzed for differentially expressed genes over time with ‘Limma’, and used for the inference of a robust GRN with NetGenerator V2.0, a heuristic ordinary differential equation-based method with soft integration of prior knowledge. Results Using all genes differentially expressed over time in RA-SFBs for any stimulus, and selecting the genes belonging to the most significant gene ontology (GO) term, i.e., ‘cartilage development’, a dynamic, robust, moderately complex multi-stimuli GRN was generated with 24 genes and 57 edges in total, 31 of which were gene-to-gene edges. Prior literature-based knowledge derived from Pathway Studio or manual searches was reflected in the final network by 25/57 confirmed edges (44%). The model contained known network motifs crucial for dynamic cellular behavior, e.g., cross-talk among pathways, positive feed-back loops, and positive feed-forward motifs (including suppression of the transcriptional repressor OSR2 by all 4 stimuli. Conclusion A multi-stimuli GRN highly concordant with literature data was successfully generated by network inference from the gene expression of stimulated RA-SFBs. The GRN showed high reliability, since 10 predicted edges were independently validated by literature findings post network inference. The selected GO term ‘cartilage development’ contained a number of differentiation markers, growth factors, and transcription factors with potential relevance for RA. Finally, the model provided new insight into the response of RA-SFBs to multiple stimuli implicated in the pathogenesis of RA, in particular to the ‘novel’ potent growth factor PDGF-D.
Collapse
Affiliation(s)
- Peter Kupfer
- Leibnitz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr, 11a, 07745 Jena, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
11
|
An algebra-based method for inferring gene regulatory networks. BMC SYSTEMS BIOLOGY 2014; 8:37. [PMID: 24669835 PMCID: PMC4022379 DOI: 10.1186/1752-0509-8-37] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Accepted: 03/06/2014] [Indexed: 11/10/2022]
Abstract
BACKGROUND The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. RESULTS This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. CONCLUSIONS Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html.
Collapse
|
12
|
Altay G, Kurt Z, Dehmer M, Emmert-Streib F. Netmes: assessing gene network inference algorithms by network-based measures. Evol Bioinform Online 2014; 10:1-9. [PMID: 24526830 PMCID: PMC3921134 DOI: 10.4137/ebo.s13481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Revised: 12/01/2013] [Accepted: 12/07/2013] [Indexed: 11/27/2022] Open
Abstract
Gene regulatory network inference (GRNI) algorithms are essential for efficiently utilizing large-scale microarray datasets to elucidate biochemical interactions among molecules in a cell. Recently, the combination of network-based error measures complemented with an ensemble approach became popular for assessing the inference performance of the GRNI algorithms. For this reason, we developed a software package to facilitate the usage of such metrics. In this paper, we present netmes, an R software package that allows the assessment of GRNI algorithms. The software package netmes is available from the R-Forge web site https://r-forge.r-project.org/projects/netmes/.
Collapse
Affiliation(s)
- Gökmen Altay
- Biomedical Engineering, Bahçeşehir University, Beşiktaş, Istanbul, Turkey
| | - Zeyneb Kurt
- Department of Computer Engineering, Yildiz Technical University, Davutpasa Campus, 34220, Esenler, Istanbul, Turkey
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT- The Health and Life Sciences University, Eduard Wallnoefer Zentrum 1, 6060 Hall in Tyrol, Austria
| | - Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, UK
| |
Collapse
|
13
|
Inference of Vohradský's models of genetic networks by solving two-dimensional function optimization problems. PLoS One 2014; 8:e83308. [PMID: 24386175 PMCID: PMC3875442 DOI: 10.1371/journal.pone.0083308] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 11/01/2013] [Indexed: 11/21/2022] Open
Abstract
The inference of a genetic network is a problem in which mutual interactions among genes are inferred from time-series of gene expression levels. While a number of models have been proposed to describe genetic networks, this study focuses on a mathematical model proposed by Vohradský. Because of its advantageous features, several researchers have proposed the inference methods based on Vohradský's model. When trying to analyze large-scale networks consisting of dozens of genes, however, these methods must solve high-dimensional non-linear function optimization problems. In order to resolve the difficulty of estimating the parameters of the Vohradský's model, this study proposes a new method that defines the problem as several two-dimensional function optimization problems. Through numerical experiments on artificial genetic network inference problems, we showed that, although the computation time of the proposed method is not the shortest, the method has the ability to estimate parameters of Vohradský's models more effectively with sufficiently short computation times. This study then applied the proposed method to an actual inference problem of the bacterial SOS DNA repair system, and succeeded in finding several reasonable regulations.
Collapse
|
14
|
Kunkle BW, Yoo C, Roy D. Reverse engineering of modified genes by Bayesian network analysis defines molecular determinants critical to the development of glioblastoma. PLoS One 2013; 8:e64140. [PMID: 23737970 PMCID: PMC3667850 DOI: 10.1371/journal.pone.0064140] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Accepted: 03/28/2013] [Indexed: 12/22/2022] Open
Abstract
In this study we have identified key genes that are critical in development of astrocytic tumors. Meta-analysis of microarray studies which compared normal tissue to astrocytoma revealed a set of 646 differentially expressed genes in the majority of astrocytoma. Reverse engineering of these 646 genes using Bayesian network analysis produced a gene network for each grade of astrocytoma (Grade I-IV), and 'key genes' within each grade were identified. Genes found to be most influential to development of the highest grade of astrocytoma, Glioblastoma multiforme were: COL4A1, EGFR, BTF3, MPP2, RAB31, CDK4, CD99, ANXA2, TOP2A, and SERBP1. All of these genes were up-regulated, except MPP2 (down regulated). These 10 genes were able to predict tumor status with 96-100% confidence when using logistic regression, cross validation, and the support vector machine analysis. Markov genes interact with NFkβ, ERK, MAPK, VEGF, growth hormone and collagen to produce a network whose top biological functions are cancer, neurological disease, and cellular movement. Three of the 10 genes - EGFR, COL4A1, and CDK4, in particular, seemed to be potential 'hubs of activity'. Modified expression of these 10 Markov Blanket genes increases lifetime risk of developing glioblastoma compared to the normal population. The glioblastoma risk estimates were dramatically increased with joint effects of 4 or more than 4 Markov Blanket genes. Joint interaction effects of 4, 5, 6, 7, 8, 9 or 10 Markov Blanket genes produced 9, 13, 20.9, 26.7, 52.8, 53.2, 78.1 or 85.9%, respectively, increase in lifetime risk of developing glioblastoma compared to normal population. In summary, it appears that modified expression of several 'key genes' may be required for the development of glioblastoma. Further studies are needed to validate these 'key genes' as useful tools for early detection and novel therapeutic options for these tumors.
Collapse
Affiliation(s)
- Brian W. Kunkle
- Department of Environmental and Occupational Health, Florida International University, Miami, Florida, United States of America
| | - Changwon Yoo
- Department of Biostatistics, Florida International University, Miami, Florida, United States of America
| | - Deodutta Roy
- Department of Environmental and Occupational Health, Florida International University, Miami, Florida, United States of America
- * E-mail:
| |
Collapse
|
15
|
Le NT, Ho TB, Ho BH. Computational reconstruction of transcriptional relationships from ChIP-chip data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:300-307. [PMID: 22848139 DOI: 10.1109/tcbb.2012.102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
UNLABELLED Eukaryotic gene transcription is a complex process, which requires the orchestrated recruitment of a large number of proteins, such as sequence-specific DNA binding factors, chromatin remodelers and modifiers, and general transcription machinery, to regulatory regions. Previous works have shown that these regulatory proteins favor specific organizational theme along promoters. Details about how they cooperatively regulate transcriptional process, however, remain unclear. We developed a method to reconstruct a Bayesian network (BN) model representing functional relationships among various transcriptional components. The positive/negative influence between these components was measured from protein binding and nucleosome occupancy data and embedded into the model. Application on S.cerevisiae ChIP-Chip data showed that the proposed method can recover confirmed relationships, such as Isw1-Pol II, TFIIH-Pol II, TFIIB-TBP, Pol II-H3K36Me3, H3K4Me3-H3K14Ac, etc. Moreover, it can distinguish colocating components from functionally related ones. Novel relationships, e.g., ones between Mediator and chromatin remodeling complexes (CRCs), and the combinatorial regulation of Pol II recruitment and activity by CRCs and general transcription factors (GTFs), were also suggested. CONCLUSION protein binding events during transcription positively influence each other. Among contributing components, GTFs and CRCs play pivotal roles in transcriptional regulation. These findings provide insights into the regulatory mechanism.
Collapse
Affiliation(s)
- Ngoc Tu Le
- School of Knowledge Science, Japan Advanced Institute of Science and Technology, Asahidai 1-1, Nomi, Ishikawa 923-1292, Japan.
| | | | | |
Collapse
|
16
|
Muraro D, Voβ U, Wilson M, Bennett M, Byrne H, De Smet I, Hodgman C, King J. Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:50-60. [PMID: 23702543 DOI: 10.1109/tcbb.2013.3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Regulation of gene expression is crucial for organism growth, and it is one of the challenges in systems biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyze two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, and assess causality of their regulatory interactions by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation.
Collapse
Affiliation(s)
- Daniele Muraro
- Centre for Plant Integrative Biology, School of Biosciences, University of Nottingham, Loughborough, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Das R, Mitra S, Murthy CA. Extracting gene-gene interactions through curve fitting. IEEE Trans Nanobioscience 2012; 11:402-9. [PMID: 22997274 DOI: 10.1109/tnb.2012.2217984] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This paper presents a simple and novel curve fitting approach for generating simple gene regulatory subnetworks from time series gene expression data. Microarray experiments simultaneously generate massive data sets and help immensely in the large-scale study of gene expression patterns. Initial biclustering reduces the search space in the high-dimensional microarray data. The least-squares error between fitting of gene pairs is minimized to extract a set of gene-gene interactions, involving transcriptional regulation of genes. The higher error values are eliminated to retain only the strong interacting gene pairs in the resultant gene regulatory subnetwork. Next the algorithm is extended to a generalized framework to enhance its capability. The methodology takes care of the higher-order dependencies involving multiple genes co-regulating a single gene, while eliminating the need for user-defined parameters. It has been applied to the time-series Yeast data, and the experimental results biologically validated using standard databases and literature.
Collapse
Affiliation(s)
- Ranajit Das
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700 108, India.
| | | | | |
Collapse
|
18
|
Abstract
Network inference approaches are now widely used in biological applications to probe regulatory relationships between molecular components such as genes or proteins. Many methods have been proposed for this setting, but the connections and differences between their statistical formulations have received less attention. In this paper, we show how a broad class of statistical network inference methods, including a number of existing approaches, can be described in terms of variable selection for the linear model. This reveals some subtle but important differences between the methods, including the treatment of time intervals in discretely observed data. In developing a general formulation, we also explore the relationship between single-cell stochastic dynamics and network inference on averages over cells. This clarifies the link between biochemical networks as they operate at the cellular level and network inference as carried out on data that are averages over populations of cells. We present empirical results, comparing thirty-two network inference methods that are instances of the general formulation we describe, using two published dynamical models. Our investigation sheds light on the applicability and limitations of network inference and provides guidance for practitioners and suggestions for experimental design.
Collapse
Affiliation(s)
- C J Oates
- Centre for Complexity Science, University of Warwick, CV4 7AL, UK ; Department of Statistics, University of Warwick, CV4 7AL, UK ; Netherlands Cancer Institute, 1066 CX, Amsterdam, The Netherlands
| | | |
Collapse
|
19
|
Altay G. Empirically determining the sample size for large-scale gene network inference algorithms. IET Syst Biol 2012; 6:35-43. [PMID: 22519356 DOI: 10.1049/iet-syb.2010.0091] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The performance of genome-wide gene regulatory network inference algorithms depends on the sample size. It is generally considered that the larger the sample size, the better the gene network inference performance. Nevertheless, there is not adequate information on determining the sample size for optimal performance. In this study, the author systematically demonstrates the effect of sample size on information-theory-based gene network inference algorithms with an ensemble approach. The empirical results showed that the inference performances of the considered algorithms tend to converge after a particular sample size region. As a specific example, the sample size region around ≃64 is sufficient to obtain the most of the inference performance with respect to precision using the representative algorithm C3NET on the synthetic steady-state data sets of Escherichia coli and also time-series data set of a homo sapiens subnetworks. The author verified the convergence result on a large, real data set of E. coli as well. The results give evidence to biologists to better design experiments to infer gene networks. Further, the effect of cutoff on inference performances over various sample sizes is considered. [Includes supplementary material].
Collapse
Affiliation(s)
- G Altay
- University of Cambridge, Department of Oncology, Cambridge, UK.
| |
Collapse
|
20
|
Moulos P, Valavanis I, Klein J, Maglogiannis I, Schanstra J, Chatziioannou A. Unifying the integration, analysis and interpretation of multi-omic datasets: exploration of the disease networks of Obstructive Nephropathy in children. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2011:3716-9. [PMID: 22255147 DOI: 10.1109/iembs.2011.6090631] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The wealth of data amassed by the utilization of various high-throughput techniques, in various layers of molecular dissection, stresses the critical role of the unification of the computational methodologies applied in biological data handling, storage, analysis and visualization. In this article, a generic workflow is showcased in a multi-omic dataset that is used to study Obstructive Nephropathy (ON) in children, integrating microarray data from several biological layers (transcriptomic, post-transcriptomic, proteomic). The workflow exploits raw measurements and through several analytical stages (preprocessing, statistical and functional), which entail various parsing steps, reaches the visualization stage of the heterogeneous, broader, molecular interacting network derived. This network, where the interconnected entities are exploiting the knowledge stored in public repositories, represents a systems level interpretation of the pathological state probed.
Collapse
Affiliation(s)
- Panagiotis Moulos
- Institute of Biological Research and Biotechnology, National Hellenic Research Foundation, Athens, Greece.
| | | | | | | | | | | |
Collapse
|
21
|
Titsias MK, Honkela A, Lawrence ND, Rattray M. Identifying targets of multiple co-regulating transcription factors from expression time-series by Bayesian model comparison. BMC SYSTEMS BIOLOGY 2012; 6:53. [PMID: 22647244 PMCID: PMC3527261 DOI: 10.1186/1752-0509-6-53] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 05/30/2012] [Indexed: 02/02/2023]
Abstract
BACKGROUND Complete transcriptional regulatory network inference is a huge challenge because of the complexity of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference of context-specific networks involving a few interacting transcription factors (TFs) and all of their target genes. RESULTS We present a computational framework for Bayesian statistical inference of target genes of multiple interacting TFs from high-throughput gene expression time-series data. We use ordinary differential equation models that describe transcription of target genes taking into account combinatorial regulation. The method consists of a training and a prediction phase. During the training phase we infer the unobserved TF protein concentrations on a subnetwork of approximately known regulatory structure. During the prediction phase we apply Bayesian model selection on a genome-wide scale and score all alternative regulatory structures for each target gene. We use our methodology to identify targets of five TFs regulating Drosophila melanogaster mesoderm development. We find that confident predicted links between TFs and targets are significantly enriched for supporting ChIP-chip binding events and annotated TF-gene interations. Our method statistically significantly outperforms existing alternatives. CONCLUSIONS Our results show that it is possible to infer regulatory links between multiple interacting TFs and their target genes even from a single relatively short time series and in presence of unmodelled confounders and unreliable prior knowledge on training network connectivity. Introducing data from several different experimental perturbations significantly increases the accuracy.
Collapse
Affiliation(s)
- Michalis K Titsias
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
| | | | | | | |
Collapse
|
22
|
Beg QK, Zampieri M, Klitgord N, Collins SB, Altafini C, Serres MH, Segrè D. Detection of transcriptional triggers in the dynamics of microbial growth: application to the respiratorily versatile bacterium Shewanella oneidensis. Nucleic Acids Res 2012; 40:7132-49. [PMID: 22638572 PMCID: PMC3424579 DOI: 10.1093/nar/gks467] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The capacity of microorganisms to respond to variable external conditions requires a coordination of environment-sensing mechanisms and decision-making regulatory circuits. Here, we seek to understand the interplay between these two processes by combining high-throughput measurement of time-dependent mRNA profiles with a novel computational approach that searches for key genetic triggers of transcriptional changes. Our approach helped us understand the regulatory strategies of a respiratorily versatile bacterium with promising bioenergy and bioremediation applications, Shewanella oneidensis, in minimal and rich media. By comparing expression profiles across these two conditions, we unveiled components of the transcriptional program that depend mainly on the growth phase. Conversely, by integrating our time-dependent data with a previously available large compendium of static perturbation responses, we identified transcriptional changes that cannot be explained solely by internal network dynamics, but are rather triggered by specific genes acting as key mediators of an environment-dependent response. These transcriptional triggers include known and novel regulators that respond to carbon, nitrogen and oxygen limitation. Our analysis suggests a sequence of physiological responses, including a coupling between nitrogen depletion and glycogen storage, partially recapitulated through dynamic flux balance analysis, and experimentally confirmed by metabolite measurements. Our approach is broadly applicable to other systems.
Collapse
Affiliation(s)
- Qasim K Beg
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | | | | | | | | | | | | |
Collapse
|
23
|
Abstract
Computers are organized into hardware and software. Using a theoretical approach to identify patterns in gene expression in a variety of species, organs, and cell types, we found that biological systems similarly are comprised of a relatively unchanging hardware-like gene pattern. Orthogonal patterns of software-like transcripts vary greatly, even among tumors of the same type from different individuals. Two distinguishable classes could be identified within the hardware-like component: those transcripts that are highly expressed and stable and an adaptable subset with lower expression that respond to external stimuli. Importantly, we demonstrate that this structure is conserved across organisms. Deletions of transcripts from the highly stable core are predicted to result in cell mortality. The approach provides a conceptual thermodynamic-like framework for the analysis of gene-expression levels and networks and their variations in diseased cells.
Collapse
|
24
|
Kimura S, Araki D, Matsumura K, Okada-Hatakeyama M. Inference of S-system models of genetic networks by solving one-dimensional function optimization problems. Math Biosci 2012; 235:161-70. [PMID: 22155075 DOI: 10.1016/j.mbs.2011.11.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Revised: 10/21/2011] [Accepted: 11/22/2011] [Indexed: 11/17/2022]
Affiliation(s)
- S Kimura
- Graduate School of Engineering, Tottori University, 4-101, Koyama-minami, Tottori 680-8552, Japan.
| | | | | | | |
Collapse
|
25
|
Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B, de Givry S. Gene regulatory network reconstruction using Bayesian networks, the Dantzig Selector, the Lasso and their meta-analysis. PLoS One 2011; 6:e29165. [PMID: 22216195 PMCID: PMC3246469 DOI: 10.1371/journal.pone.0029165] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 11/22/2011] [Indexed: 11/18/2022] Open
Abstract
Modern technologies and especially next generation sequencing facilities are giving a cheaper access to genotype and genomic data measured on the same sample at once. This creates an ideal situation for multifactorial experiments designed to infer gene regulatory networks. The fifth "Dialogue for Reverse Engineering Assessments and Methods" (DREAM5) challenges are aimed at assessing methods and associated algorithms devoted to the inference of biological networks. Challenge 3 on "Systems Genetics" proposed to infer causal gene regulatory networks from different genetical genomics data sets. We investigated a wide panel of methods ranging from Bayesian networks to penalised linear regressions to analyse such data, and proposed a simple yet very powerful meta-analysis, which combines these inference methods. We present results of the Challenge as well as more in-depth analysis of predicted networks in terms of structure and reliability. The developed meta-analysis was ranked first among the 16 teams participating in Challenge 3A. It paves the way for future extensions of our inference method and more accurate gene network estimates in the context of genetical genomics.
Collapse
Affiliation(s)
- Matthieu Vignes
- SaAB Team/BIA Unit, INRA Toulouse, Castanet-Tolosan, France.
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Hurley D, Araki H, Tamada Y, Dunmore B, Sanders D, Humphreys S, Affara M, Imoto S, Yasuda K, Tomiyasu Y, Tashiro K, Savoie C, Cho V, Smith S, Kuhara S, Miyano S, Charnock-Jones DS, Crampin EJ, Print CG. Gene network inference and visualization tools for biologists: application to new human transcriptome datasets. Nucleic Acids Res 2011; 40:2377-98. [PMID: 22121215 PMCID: PMC3315333 DOI: 10.1093/nar/gkr902] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Gene regulatory networks inferred from RNA abundance data have generated significant interest, but despite this, gene network approaches are used infrequently and often require input from bioinformaticians. We have assembled a suite of tools for analysing regulatory networks, and we illustrate their use with microarray datasets generated in human endothelial cells. We infer a range of regulatory networks, and based on this analysis discuss the strengths and limitations of network inference from RNA abundance data. We welcome contact from researchers interested in using our inference and visualization tools to answer biological questions.
Collapse
Affiliation(s)
- Daniel Hurley
- Auckland Bioengineering Institute, Department of Molecular Medicine and Pathology, School of Medical Sciences, Faculty of Medical and Health Sciences, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Inference of complex biological networks: distinguishability issues and optimization-based solutions. BMC SYSTEMS BIOLOGY 2011; 5:177. [PMID: 22034917 PMCID: PMC3305990 DOI: 10.1186/1752-0509-5-177] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2011] [Accepted: 10/28/2011] [Indexed: 12/31/2022]
Abstract
Background The inference of biological networks from high-throughput data has received huge attention during the last decade and can be considered an important problem class in systems biology. However, it has been recognized that reliable network inference remains an unsolved problem. Most authors have identified lack of data and deficiencies in the inference algorithms as the main reasons for this situation. Results We claim that another major difficulty for solving these inference problems is the frequent lack of uniqueness of many of these networks, especially when prior assumptions have not been taken properly into account. Our contributions aid the distinguishability analysis of chemical reaction network (CRN) models with mass action dynamics. The novel methods are based on linear programming (LP), therefore they allow the efficient analysis of CRNs containing several hundred complexes and reactions. Using these new tools and also previously published ones to obtain the network structure of biological systems from the literature, we find that, often, a unique topology cannot be determined, even if the structure of the corresponding mathematical model is assumed to be known and all dynamical variables are measurable. In other words, certain mechanisms may remain undetected (or they are falsely detected) while the inferred model is fully consistent with the measured data. It is also shown that sparsity enforcing approaches for determining 'true' reaction structures are generally not enough without additional prior information. Conclusions The inference of biological networks can be an extremely challenging problem even in the utopian case of perfect experimental information. Unfortunately, the practical situation is often more complex than that, since the measurements are typically incomplete, noisy and sometimes dynamically not rich enough, introducing further obstacles to the structure/parameter estimation process. In this paper, we show how the structural uniqueness and identifiability of the models can be guaranteed by carefully adding extra constraints, and that these important properties can be checked through appropriate computation methods.
Collapse
|
28
|
Abstract
In this paper, a set of data is assumed to be obtained from an experiment that satisfies a Boolean dynamic process. For instance, the dataset can be obtained from the diagnosis of describing the diffusion process of cancer cells. With the observed datasets, several methods to construct the dynamic models for such Boolean networks are proposed. Instead of building the logical dynamics of a Boolean network directly, its algebraic form is constructed first and then is converted back to the logical form. Firstly, a general construction technique is proposed. To reduce the size of required data, the model with the known network graph is considered. Motivated by this, the least in-degree model is constructed that can reduce the size of required data set tremendously. Next, the uniform network is investigated. The number of required data points for identification of such networks is independent of the size of the network. Finally, some principles are proposed for dealing with data with errors.
Collapse
Affiliation(s)
- Daizhan Cheng
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing 100190, China.
| | | | | |
Collapse
|
29
|
Summer G, Perkins TJ. Functional data analysis for identifying nonlinear models of gene regulatory networks. BMC Genomics 2010; 11 Suppl 4:S18. [PMID: 21143801 PMCID: PMC3005930 DOI: 10.1186/1471-2164-11-s4-s18] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background A key problem in systems biology is estimating dynamical models of gene regulatory networks. Traditionally, this has been done using regression or other ad-hoc methods when the model is linear. More detailed, realistic modeling studies usually employ nonlinear dynamical models, which lead to computationally difficult parameter estimation problems. Functional data analysis methods, however, offer a means to simplify fitting by transforming the problem from one of matching modeled and observed dynamics to one of matching modeled and observed time derivatives–a regression problem, albeit a nonlinear one. Results We formulate a functional data analysis approach for estimating the parameters of nonlinear dynamical models and evaluate this approach on data from two real systems, the gap gene system of Drosophila melanogaster and the synthetic IRMA network, which was created expressly as a test case for genetic network inference. We also evaluate the approach on simulated data sets generated by the GeneNetWeaver program, the basis for the annual DREAM reverse engineering challenge. We assess the accuracy with which the correct regulatory relationships within the networks are extracted, and consider alternative methods of regularization for the purpose of overfitting avoidance. We also show that the computational efficiency of the functional data analysis approach, and the decomposability of the resulting regression problem, allow us to explicitly enumerate and evaluate all possible regulator combinations for every gene. This gives deeper insight into the the relevance of different regulators or regulator combinations, and lets one check for alternative regulatory explanations. Conclusions Functional data analysis is a powerful approach for estimating detailed nonlinear models of gene expression dynamics, allowing efficient and accurate estimation of regulatory architecture.
Collapse
Affiliation(s)
- Georg Summer
- Ottawa Hospital Research Institute, Ottawa, Ontario, Canada.
| | | |
Collapse
|
30
|
Sarder P, Schierding W, Cobb JP, Nehorai A. Estimating sparse gene regulatory networks using a bayesian linear regression. IEEE Trans Nanobioscience 2010; 9:121-31. [PMID: 20650703 DOI: 10.1109/tnb.2010.2043444] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In this paper, we propose a gene regulatory network (GRN) estimation method, which assumes that such networks are typically sparse, using time-series microarray datasets. We represent the regulatory relationships between the genes using weights, with the "net" regulation influence on a gene's expression being the summation of the independent regulatory inputs. We estimate the weights using a Bayesian linear regression method for sparse parameter vectors. We apply our proposed method to the extraction of differential gene expression software selected genes of a human buffy-coat microarray expression profile dataset of ventilator-associated pneumonia (VAP), and compare the estimation result with the GRNs estimated using both a correlation coefficient method and a database-based method ingenuity pathway analysis. A biological analysis of the resulting consensus network that is derived using the GRNs, estimated with both our and the correlation-coefficient methods results in four biologically meaningful subnetworks. Also, our method performs either better than or competitively with the existing well-established GRN estimation methods. Moreover, it performs comparatively with respect to: 1) the ground-truth GRNs for the in silico 50- and 100-gene datasets reported recently in the DREAM3 challenge and 2) the GRN estimated using a mutual information-based method for the top-ranked Bayesian analysis of time series (a Bayesian user-friendly software for analyzing time-series microarray experiments) selected genes of the VAP dataset.
Collapse
Affiliation(s)
- Pinaki Sarder
- Department of Electrical and Systems Engineering,Washington University, St. Louis, MO 63130, USA.
| | | | | | | |
Collapse
|
31
|
Tenenhaus A, Guillemot V, Gidrol X, Frouin V. Gene association networks from microarray data using a regularized estimation of partial correlation based on PLS regression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:251-262. [PMID: 20431145 DOI: 10.1109/tcbb.2008.87] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Reconstruction of gene-gene interactions from large-scale data such as microarrays is a first step toward better understanding the mechanisms at work in the cell. Two main issues have to be managed in such a context: 1) choosing which measures have to be used to distinguish between direct and indirect interactions from high-dimensional microarray data and 2) constructing networks with a low proportion of false-positive edges. We present an efficient methodology for the reconstruction of gene interaction networks in a small-sample-size setting. The strength of independence of any two genes is measured, in such "high-dimensional network," by a regularized estimation of partial correlation based on Partial Least Squares Regression. We finally emphasize specific properties of the proposed method. To assess the sensitivity and specificity of the method, we carried out the reconstruction of networks from simulated data. We also tested PLS-based partial correlation network on static and dynamic real microarray data. An R implementation of the proposed algorithm is available from http://biodev.extra.cea.fr/plspcnetwork/.
Collapse
Affiliation(s)
- Arthur Tenenhaus
- Laboratoire d'Exploration Fonctionnelle des Genomes, Institut de Radiobiologie Cellulaire et Moléculaire, Commissariat à l'Energie Atomique, 2 rue Gaston Cremieux, F-91000 Evry, France.
| | | | | | | |
Collapse
|
32
|
Cicatiello L, Mutarelli M, Grober OMV, Paris O, Ferraro L, Ravo M, Tarallo R, Luo S, Schroth GP, Seifert M, Zinser C, Chiusano ML, Traini A, De Bortoli M, Weisz A. Estrogen receptor alpha controls a gene network in luminal-like breast cancer cells comprising multiple transcription factors and microRNAs. THE AMERICAN JOURNAL OF PATHOLOGY 2010; 176:2113-30. [PMID: 20348243 DOI: 10.2353/ajpath.2010.090837] [Citation(s) in RCA: 132] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Luminal-like breast tumor cells express estrogen receptor alpha (ERalpha), a member of the nuclear receptor family of ligand-activated transcription factors that controls their proliferation, survival, and functional status. To identify the molecular determinants of this hormone-responsive tumor phenotype, a comprehensive genome-wide analysis was performed in estrogen stimulated MCF-7 and ZR-75.1 cells by integrating time-course mRNA expression profiling with global mapping of genomic ERalpha binding sites by chromatin immunoprecipitation coupled to massively parallel sequencing, microRNA expression profiling, and in silico analysis of transcription units and receptor binding regions identified. All 1270 genes that were found to respond to 17beta-estradiol in both cell lines cluster in 33 highly concordant groups, each of which showed defined kinetics of RNA changes. This hormone-responsive gene set includes several direct targets of ERalpha and is organized in a gene regulation cascade, stemming from ligand-activated receptor and reaching a large number of downstream targets via AP-2gamma, B-cell activating transcription factor, E2F1 and 2, E74-like factor 3, GTF2IRD1, hairy and enhancer of split homologue-1, MYB, SMAD3, RARalpha, and RXRalpha transcription factors. MicroRNAs are also integral components of this gene regulation network because miR-107, miR-424, miR-570, miR-618, and miR-760 are regulated by 17beta-estradiol along with other microRNAs that can target a significant number of transcripts belonging to one or more estrogen-responsive gene clusters.
Collapse
Affiliation(s)
- Luigi Cicatiello
- Department of General Pathology, Second University of Naples, Napoli, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Porreca R, Cinquemani E, Lygeros J, Ferrari-Trecate G. Identification of genetic network dynamics with unate structure. ACTA ACUST UNITED AC 2010; 26:1239-45. [PMID: 20305266 DOI: 10.1093/bioinformatics/btq120] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Modern experimental techniques for time course measurement of gene expression enable the identification of dynamical models of genetic regulatory networks. In general, identification involves fitting appropriate network structures and parameters to the data. For a given set of genes, exploring all possible network structures is clearly prohibitive. Modelling and identification methods for the a priori selection of network structures compatible with biological knowledge and experimental data are necessary to make the identification problem tractable. RESULTS We propose a differential equation modelling framework where the regulatory interactions among genes are expressed in terms of unate functions, a class of gene activation rules commonly encountered in Boolean network modelling. We establish analytical properties of the models in the class and exploit them to devise a two-step procedure for gene network reconstruction from product concentration and synthesis rate time series. The first step isolates a family of model structures compatible with the data from a set of most relevant biological hypotheses. The second step explores this family and returns a pool of best fitting models along with estimates of their parameters. The method is tested on a simulated network and compared with state-of-the-art network inference methods on the benchmark synthetic network IRMA.
Collapse
|
34
|
Jing L, Ng MK, Liu Y. Construction of gene networks with hybrid approach from expression profile and gene ontology. ACTA ACUST UNITED AC 2009; 14:107-18. [PMID: 19789116 DOI: 10.1109/titb.2009.2033056] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene regulatory networks have been long studied in model organisms as a means of identifying functional relationships among genes or their corresponding products. Despite many existing methods for genome-wide construction of such networks, solutions to the gene regulatory networks problem are however not trivial. Here, we present, a hybrid approach with gene expression profiles and gene ontology (HAEO). HAEO makes use of multimethods (overlapping clustering and reverse engineering methods) to effectively and efficiently construct gene regulatory networks from multisources (gene expression profiles and gene ontology). Application to yeast cell cycle dataset demonstrates HAEO's ability to construct validated gene regulatory networks, such as some potential gene regulatory pairs, which cannot be discovered by general inferring methods and identifying cycles (i.e., feedback loops) between genes. We also experimentally study the efficiency of building networks and show that the proposed method, HAEO is much faster than Bayesian networks method.
Collapse
Affiliation(s)
- Liping Jing
- School of Computer and Information Technology, BeijingJiaotong University, Beijing 100044, China.
| | | | | |
Collapse
|
35
|
Baralla A, Mentzen WI, De La Fuente A. Inferring Gene Networks: Dream or Nightmare? Ann N Y Acad Sci 2009; 1158:246-56. [DOI: 10.1111/j.1749-6632.2008.04099.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
36
|
Cuccato G, Gatta GD, di Bernardo D. Systems and Synthetic biology: tackling genetic networks and complex diseases. Heredity (Edinb) 2009; 102:527-32. [DOI: 10.1038/hdy.2009.18] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
37
|
Kimura S, Nakayama S, Hatakeyama M. Genetic network inference as a series of discrimination tasks. ACTA ACUST UNITED AC 2009; 25:918-25. [PMID: 19189976 DOI: 10.1093/bioinformatics/btp072] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Genetic network inference methods based on sets of differential equations generally require a great deal of time, as the equations must be solved many times. To reduce the computational cost, researchers have proposed other methods for inferring genetic networks by solving sets of differential equations only a few times, or even without solving them at all. When we try to obtain reasonable network models using these methods, however, we must estimate the time derivatives of the gene expression levels with great precision. In this study, we propose a new method to overcome the drawbacks of inference methods based on sets of differential equations. RESULTS Our method infers genetic networks by obtaining classifiers capable of predicting the signs of the derivatives of the gene expression levels. For this purpose, we defined a genetic network inference problem as a series of discrimination tasks, then solved the defined series of discrimination tasks with a linear programming machine. Our experimental results demonstrated that the proposed method is capable of correctly inferring genetic networks, and doing so more than 500 times faster than the other inference methods based on sets of differential equations. Next, we applied our method to actual expression data of the bacterial SOS DNA repair system. And finally, we demonstrated that our approach relates to the inference method based on the S-system model. Though our method provides no estimation of the kinetic parameters, it should be useful for researchers interested only in the network structure of a target system. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuhei Kimura
- Graduate School of Engineering, Tottori University, Koyama-minami, Tottori, Japan.
| | | | | |
Collapse
|