1
|
Kontio JAJ, Rinta-Aho MJ, Sillanpää MJ. Estimating Linear and Nonlinear Gene Coexpression Networks by Semiparametric Neighborhood Selection. Genetics 2020; 215:597-607. [PMID: 32414870 PMCID: PMC7337083 DOI: 10.1534/genetics.120.303186] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 05/11/2020] [Indexed: 11/18/2022] Open
Abstract
Whereas nonlinear relationships between genes are acknowledged, there exist only a few methods for estimating nonlinear gene coexpression networks or gene regulatory networks (GCNs/GRNs) with common deficiencies. These methods often consider only pairwise associations between genes, and are, therefore, poorly capable of identifying higher-order regulatory patterns when multiple genes should be considered simultaneously. Another critical issue in current nonlinear GCN/GRN estimation approaches is that they consider linear and nonlinear dependencies at the same time in confounded form nonparametrically. This severely undermines the possibilities for nonlinear associations to be found, since the power of detecting nonlinear dependencies is lower compared to linear dependencies, and the sparsity-inducing procedures might favor linear relationships over nonlinear ones only due to small sample sizes. In this paper, we propose a method to estimate undirected nonlinear GCNs independently from the linear associations between genes based on a novel semiparametric neighborhood selection procedure capable of identifying complex nonlinear associations between genes. Simulation studies using the common DREAM3 and DREAM9 datasets show that the proposed method compares superiorly to the current nonlinear GCN/GRN estimation methods.
Collapse
Affiliation(s)
- Juho A J Kontio
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
| | - Marko J Rinta-Aho
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, Biocenter Oulu, University of Oulu, 90014, Finland
- Infotech Oulu, University of Oulu, 90014, Finland
| |
Collapse
|
2
|
Bakhteh S, Ghaffari-Hadigheh A, Chaparzadeh N. Identification of Minimum Set of Master Regulatory Genes in Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:999-1009. [PMID: 30334767 DOI: 10.1109/tcbb.2018.2875692] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Identification of master regulatory genes is one of the primary challenges in systems biology. The minimum dominating set problem is a powerful paradigm in analyzing such complex networks. In these models, genes stand as nodes and their interactions are assumed as edges. Here, members of a minimal dominating set could be regarded as master genes. As finitely many minimum dominating sets may exist in a network, it is difficult to identify which one represents the most appropriate set of master genes. In this paper, we develop a weighted gene regulatory network problem with two objectives as a version of the dominating set problem. Collective influence of each gene is considered as its weight. The first objective aims to find a master regulatory genes set with minimum cardinality, and the second objective identifies the one with maximum weight. The model is converted to a single objective using a parameter varying between zero and one. The model is implemented on three human networks, and the results are reported and compared with the existing model of weighted network. Parametric programming in linear optimization and logistic regression are also implemented on the arisen relaxed problem to provide a deeper understanding of the results. Learned from computational results in parametric analysis, for some ranges of priorities in objectives, the identified master regulatory genes are invariant, while some of them are identified for all priorities. This would be an indication that such genes have higher degree of being master regulatory ones, specially on the noisy networks.
Collapse
|
3
|
Liu J, Tian Z, Xiao Y, Liu H, Hao S, Zhang X, Wang C, Sun J, Yu H, Yan J. Gene Regulatory Relationship Mining Using Improved Three-Phase Dependency Analysis Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:339-346. [PMID: 30281476 DOI: 10.1109/tcbb.2018.2872993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
How to mine the gene regulatory relationship and construct gene regulatory network (GRN) is of utmost interest within the whole biological community, however, which has been consistently a challenging problem since the tremendous complexity in cellular systems. In present work, we construct gene regulatory network using an improved three-phase dependency analysis algorithm (TPDA) Bayesian network learning method, which includes the steps of Drafting, Thickening, and Thinning. In order to solve the problem of learning result is not reliable due to the high order conditional independence test, we use the entropy estimation approach of Gaussian kernel probability density estimator to calculate the (conditional) mutual information between genes. The experiment on the public benchmark data sets show the improved method outperforms the other nine kinds of Bayesian network learning methods when to process the data with large sample size, with small number of discrete values, and the frequency of different discrete values is about same. In addition, the improved TPDA method was further applied on a real large gene expression data set on RNA-seq from a global collection with 368 elite maize inbred lines. Experiment results show it performs better than the original TPDA method and the other nine kinds of Bayesian network learning algorithms significantly.
Collapse
|
4
|
Tabar VR, Eskandari F, Salimi S, Zareifard H. Finding a set of candidate parents using dependency criterion for the K2 algorithm. Pattern Recognit Lett 2018. [DOI: 10.1016/j.patrec.2018.04.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
5
|
Liu W, Zhu W, Liao B, Chen H, Ren S, Cai L. Improving gene regulatory network structure using redundancy reduction in the MRNET algorithm. RSC Adv 2017. [DOI: 10.1039/c7ra01557g] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Inferring gene regulatory networks from expression data is a central problem in systems biology.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Wen Zhu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Bo Liao
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Haowen Chen
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Siqi Ren
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Lijun Cai
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| |
Collapse
|
6
|
Kiani NA, Kaderali L. Dynamic probabilistic threshold networks to infer signaling pathways from time-course perturbation data. BMC Bioinformatics 2014; 15:250. [PMID: 25047753 PMCID: PMC4133630 DOI: 10.1186/1471-2105-15-250] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 07/15/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Network inference deals with the reconstruction of molecular networks from experimental data. Given N molecular species, the challenge is to find the underlying network. Due to data limitations, this typically is an ill-posed problem, and requires the integration of prior biological knowledge or strong regularization. We here focus on the situation when time-resolved measurements of a system's response after systematic perturbations are available. RESULTS We present a novel method to infer signaling networks from time-course perturbation data. We utilize dynamic Bayesian networks with probabilistic Boolean threshold functions to describe protein activation. The model posterior distribution is analyzed using evolutionary MCMC sampling and subsequent clustering, resulting in probability distributions over alternative networks. We evaluate our method on simulated data, and study its performance with respect to data set size and levels of noise. We then use our method to study EGF-mediated signaling in the ERBB pathway. CONCLUSIONS Dynamic Probabilistic Threshold Networks is a new method to infer signaling networks from time-series perturbation data. It exploits the dynamic response of a system after external perturbation for network reconstruction. On simulated data, we show that the approach outperforms current state of the art methods. On the ERBB data, our approach recovers a significant fraction of the known interactions, and predicts novel mechanisms in the ERBB pathway.
Collapse
Affiliation(s)
- Narsis A Kiani
- Technische Universität Dresden, Medical Faculty Carl Gustav Carus, Institute for Medical Informatics and Biometry, Fetscherstr, 74, 01307 Dresden, Germany.
| | | |
Collapse
|
7
|
Skreti G, Bei ES, Kalantzaki K, Zervakis M. Temporal and Spatial Patterns of Gene Profiles during Chondrogenic Differentiation. IEEE J Biomed Health Inform 2014; 18:799-809. [DOI: 10.1109/jbhi.2014.2305770] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
8
|
Kalantzaki K, Bei ES, Exarchos KP, Zervakis M, Garofalakis M, Fotiadis DI. Nonparametric network design and analysis of disease genes in oral cancer progression. IEEE J Biomed Health Inform 2014; 18:562-73. [PMID: 24608056 DOI: 10.1109/jbhi.2013.2274643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Biological networks in living organisms can be seen as the ultimate means of understanding the underlying mechanisms in complex diseases, such as oral cancer. During the last decade, many algorithms based on high-throughput genomic data have been developed to unravel the complexity of gene network construction and their progression in time. However, the small size of samples compared to the number of observed genes makes the inference of the network structure quite challenging. In this study, we propose a framework for constructing and analyzing gene networks from sparse experimental temporal data and investigate its potential in oral cancer. We use two network models based on partial correlations and kernel density estimation, in order to capture the genetic interactions. Using this network construction framework on real clinical data of the tissue and blood at different time stages, we identified common disease-related structures that may decipher the association between disease state and biological processes in oral cancer. Our study emphasizes an altered MET (hepatocyte growth factor receptor) network during oral cancer progression. In addition, we demonstrate that the functional changes of gene interactions during oral cancer progression might be particularly useful for patient categorization at the time of diagnosis and/or at follow-up periods.
Collapse
|
9
|
Reconstructing biological gene regulatory networks: where optimization meets big data. EVOLUTIONARY INTELLIGENCE 2013. [DOI: 10.1007/s12065-013-0098-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
10
|
FOGELBERG CHRISTOPHER, PALADE VASILE. DENSE STRUCTURAL EXPECTATION MAXIMISATION WITH PARALLELISATION FOR EFFICIENT LARGE-NETWORK STRUCTURAL INFERENCE. INT J ARTIF INTELL T 2013. [DOI: 10.1142/s0218213013500115] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Research on networks is increasingly popular in a wide range of machine learning fields, and structural inference of networks is a key problem. Unfortunately, network structural inference is time consuming and there is an increasing need to infer the structure of ever-larger networks. This article presents the Dense Structural Expectation Maximisation (DSEM) algorithm, a novel extension of the well-known SEM algorithm. DSEM increases the efficiency of structural inference by using the time-expensive calculations required in each SEM iteration more efficiently, and can be O(N) times faster than SEM, where N is the size of the network. The article has also combined DSEM with parallelisation and evaluated the impact of these improvements over SEM, individually and combined. The possibility of combining these novel approaches with other research on structural inference is also considered. The contributions also appear to be usable for all kinds of structural inference, and may greatly improve the range, variety and size of problems which can be tractably addressed. Code is freely available online at: http://syntilect.com/cgf/pubs:software .
Collapse
Affiliation(s)
- CHRISTOPHER FOGELBERG
- Department of Computer Science, University of Oxford, Oxford, OX1 3QD, United Kingdom
| | - VASILE PALADE
- Department of Computer Science, University of Oxford, Oxford, OX1 3QD, United Kingdom
| |
Collapse
|
11
|
Kim D, Kim JM. Analysis of directional dependence using asymmetric copula-based regression models. J STAT COMPUT SIM 2013. [DOI: 10.1080/00949655.2013.779696] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
12
|
Sun X, Liu Y, Wei D, Xu M, Chen H, Han J. Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. J Biomed Inform 2013; 46:252-8. [DOI: 10.1016/j.jbi.2012.10.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 10/01/2012] [Accepted: 10/03/2012] [Indexed: 11/16/2022]
|
13
|
Han B, Chen XW, Talebizadeh Z, Xu H. Genetic studies of complex human diseases: characterizing SNP-disease associations using Bayesian networks. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S14. [PMID: 23281790 PMCID: PMC3524021 DOI: 10.1186/1752-0509-6-s3-s14] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
BACKGROUND Detecting epistatic interactions plays a significant role in improving pathogenesis, prevention, diagnosis, and treatment of complex human diseases. Applying machine learning or statistical methods to epistatic interaction detection will encounter some common problems, e.g., very limited number of samples, an extremely high search space, a large number of false positives, and ways to measure the association between disease markers and the phenotype. RESULTS To address the problems of computational methods in epistatic interaction detection, we propose a score-based Bayesian network structure learning method, EpiBN, to detect epistatic interactions. We apply the proposed method to both simulated datasets and three real disease datasets. Experimental results on simulation data show that our method outperforms some other commonly-used methods in terms of power and sample-efficiency, and is especially suitable for detecting epistatic interactions with weak or no marginal effects. Furthermore, our method is scalable to real disease data. CONCLUSIONS We propose a Bayesian network-based method, EpiBN, to detect epistatic interactions. In EpiBN, we develop a new scoring function, which can reflect higher-order epistatic interactions by estimating the model complexity from data, and apply a fast Branch-and-Bound algorithm to learn the structure of a two-layer Bayesian network containing only one target node. To make our method scalable to real data, we propose the use of a Markov chain Monte Carlo (MCMC) method to perform the screening process. Applications of the proposed method to some real GWAS (genome-wide association studies) datasets may provide helpful insights into understanding the genetic basis of Age-related Macular Degeneration, late-onset Alzheimer's disease, and autism.
Collapse
Affiliation(s)
- Bing Han
- Bioinformatics and Computational Life-Sciences Laboratory, ITTC, Department of Electrical Engineering and Computer Science, University of Kansas, 1520 West 15th Street, Lawrence, KS 66045, USA
| | - Xue-wen Chen
- Department of Computer Science Wayne State University Detroit, MI 48202
| | - Zohreh Talebizadeh
- Children's Mercy Hospital and University of Missouri-Kansas City School of Medicine, 2401 Gillham Road, Kansas City, MO 64108, USA
| | - Hua Xu
- School of Biomedical Informatics The University of Texas Health Science Center at Houston Houston, TX 77030
| |
Collapse
|
14
|
Chemmangattuvalappil N, Task K, Banerjee I. An integer optimization algorithm for robust identification of non-linear gene regulatory networks. BMC SYSTEMS BIOLOGY 2012; 6:119. [PMID: 22937832 PMCID: PMC3444924 DOI: 10.1186/1752-0509-6-119] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2012] [Accepted: 08/27/2012] [Indexed: 11/16/2022]
Abstract
Background Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction. Results We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters. Furthermore, in both the in silico and experimental case studies, the predicted gene expression profiles are in very close agreement with the dynamics of the input data. Conclusions Our integer programming algorithm effectively utilizes bootstrapping to identify robust gene regulatory networks from noisy, non-linear time-series gene expression data. With significant noise and non-linearities being inherent to biological systems, the present formulism, with the incorporation of network sparsity, is extremely relevant to gene regulatory networks, and while the formulation has been validated against in silico and E. Coli data, it can be applied to any biological system.
Collapse
Affiliation(s)
- Nishanth Chemmangattuvalappil
- Department of Chemical Engineering, University of Pittsburgh, 1249 Benedum Hall, 3700 O'Hara Street, Pittsburgh, PA 15261, USA
| | | | | |
Collapse
|
15
|
Loy CC, Xiang T, Gong S. Incremental activity modeling in multiple disjoint cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2012; 34:1799-1813. [PMID: 22184260 DOI: 10.1109/tpami.2011.246] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Activity modeling and unusual event detection in a network of cameras is challenging, particularly when the camera views are not overlapped. We show that it is possible to detect unusual events in multiple disjoint cameras as context-incoherent patterns through incremental learning of time delayed dependencies between distributed local activities observed within and across camera views. Specifically, we model multicamera activities using a Time Delayed Probabilistic Graphical Model (TD-PGM) with different nodes representing activities in different decomposed regions from different views and the directed links between nodes encoding their time delayed dependencies. To deal with visual context changes, we formulate a novel incremental learning method for modeling time delayed dependencies that change over time. We validate the effectiveness of the proposed approach using a synthetic data set and videos captured from a camera network installed at a busy underground station.
Collapse
Affiliation(s)
- Chen Change Loy
- School of Electrical Engineering and Computer Science, Queen Mary University of London, London E1 4NS, United Kingdom.
| | | | | |
Collapse
|
16
|
Alakwaa FM, Solouma NH, Kadah YM. Construction of gene regulatory networks using biclustering and Bayesian networks. Theor Biol Med Model 2011; 8:39. [PMID: 22018164 PMCID: PMC3231811 DOI: 10.1186/1742-4682-8-39] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2011] [Accepted: 10/22/2011] [Indexed: 11/25/2022] Open
Abstract
Background Understanding gene interactions in complex living systems can be seen as the ultimate goal of the systems biology revolution. Hence, to elucidate disease ontology fully and to reduce the cost of drug development, gene regulatory networks (GRNs) have to be constructed. During the last decade, many GRN inference algorithms based on genome-wide data have been developed to unravel the complexity of gene regulation. Time series transcriptomic data measured by genome-wide DNA microarrays are traditionally used for GRN modelling. One of the major problems with microarrays is that a dataset consists of relatively few time points with respect to the large number of genes. Dimensionality is one of the interesting problems in GRN modelling. Results In this paper, we develop a biclustering function enrichment analysis toolbox (BicAT-plus) to study the effect of biclustering in reducing data dimensions. The network generated from our system was validated via available interaction databases and was compared with previous methods. The results revealed the performance of our proposed method. Conclusions Because of the sparse nature of GRNs, the results of biclustering techniques differ significantly from those of previous methods.
Collapse
|
17
|
Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data. BMC Bioinformatics 2011; 12:359. [PMID: 21884587 PMCID: PMC3203352 DOI: 10.1186/1471-2105-12-359] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2011] [Accepted: 08/31/2011] [Indexed: 01/22/2023] Open
Abstract
Background Bayesian Network (BN) is a powerful approach to reconstructing genetic regulatory networks from gene expression data. However, expression data by itself suffers from high noise and lack of power. Incorporating prior biological knowledge can improve the performance. As each type of prior knowledge on its own may be incomplete or limited by quality issues, integrating multiple sources of prior knowledge to utilize their consensus is desirable. Results We introduce a new method to incorporate the quantitative information from multiple sources of prior knowledge. It first uses the Naïve Bayesian classifier to assess the likelihood of functional linkage between gene pairs based on prior knowledge. In this study we included cocitation in PubMed and schematic similarity in Gene Ontology annotation. A candidate network edge reservoir is then created in which the copy number of each edge is proportional to the estimated likelihood of linkage between the two corresponding genes. In network simulation the Markov Chain Monte Carlo sampling algorithm is adopted, and samples from this reservoir at each iteration to generate new candidate networks. We evaluated the new algorithm using both simulated and real gene expression data including that from a yeast cell cycle and a mouse pancreas development/growth study. Incorporating prior knowledge led to a ~2 fold increase in the number of known transcription regulations recovered, without significant change in false positive rate. In contrast, without the prior knowledge BN modeling is not always better than a random selection, demonstrating the necessity in network modeling to supplement the gene expression data with additional information. Conclusion our new development provides a statistical means to utilize the quantitative information in prior biological knowledge in the BN modeling of gene expression data, which significantly improves the performance.
Collapse
|
18
|
Han B, Chen XW. bNEAT: a Bayesian network method for detecting epistatic interactions in genome-wide association studies. BMC Genomics 2011; 12 Suppl 2:S9. [PMID: 21989368 PMCID: PMC3194240 DOI: 10.1186/1471-2164-12-s2-s9] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Background Detecting epistatic interactions plays a significant role in improving pathogenesis, prevention, diagnosis and treatment of complex human diseases. A recent study in automatic detection of epistatic interactions shows that Markov Blanket-based methods are capable of finding genetic variants strongly associated with common diseases and reducing false positives when the number of instances is large. Unfortunately, a typical dataset from genome-wide association studies consists of very limited number of examples, where current methods including Markov Blanket-based method may perform poorly. Results To address small sample problems, we propose a Bayesian network-based approach (bNEAT) to detect epistatic interactions. The proposed method also employs a Branch-and-Bound technique for learning. We apply the proposed method to simulated datasets based on four disease models and a real dataset. Experimental results show that our method outperforms Markov Blanket-based methods and other commonly-used methods, especially when the number of samples is small. Conclusions Our results show bNEAT can obtain a strong power regardless of the number of samples and is especially suitable for detecting epistatic interactions with slight or no marginal effects. The merits of the proposed approach lie in two aspects: a suitable score for Bayesian network structure learning that can reflect higher-order epistatic interactions and a heuristic Bayesian network structure learning method.
Collapse
Affiliation(s)
- Bing Han
- Bioinformatics and Computational Life Sciences Laboratory, ITTC, Department of Electrical Engineering and Computer Science, The University of Kansas, 1520 West 15th Street, Lawrence, KS 66045, USA
| | | |
Collapse
|
19
|
Emmert-Streib F, Dehmer M. Networks for systems biology: conceptual connection of data and function. IET Syst Biol 2011; 5:185-207. [PMID: 21639592 DOI: 10.1049/iet-syb.2010.0025] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The purpose of this study is to survey the use of networks and network-based methods in systems biology. This study starts with an introduction to graph theory and basic measures allowing to quantify structural properties of networks. Then, the authors present important network classes and gene networks as well as methods for their analysis. In the last part of this study, the authors review approaches that aim at analysing the functional organisation of gene networks and the use of networks in medicine. In addition to this, the authors advocate networks as a systematic approach to general problems in systems biology, because networks are capable of assuming multiple roles that are very beneficial connecting experimental data with a functional interpretation in biological terms.
Collapse
Affiliation(s)
- F Emmert-Streib
- Queen's University Belfast, Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Belfast, UK
| | | |
Collapse
|
20
|
Dougherty ER. Validation of inference procedures for gene regulatory networks. Curr Genomics 2011; 8:351-9. [PMID: 19412435 PMCID: PMC2671720 DOI: 10.2174/138920207783406505] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Revised: 09/20/2007] [Accepted: 09/29/2007] [Indexed: 11/22/2022] Open
Abstract
The availability of high-throughput genomic data has motivated the development of numerous algorithms to infer gene regulatory networks. The validity of an inference procedure must be evaluated relative to its ability to infer a model network close to the ground-truth network from which the data have been generated. The input to an inference algorithm is a sample set of data and its output is a network. Since input, output, and algorithm are mathematical structures, the validity of an inference algorithm is a mathematical issue. This paper formulates validation in terms of a semi-metric distance between two networks, or the distance between two structures of the same kind deduced from the networks, such as their steady-state distributions or regulatory graphs. The paper sets up the validation framework, provides examples of distance functions, and applies them to some discrete Markov network models. It also considers approximate validation methods based on data for which the generating network is not known, the kind of situation one faces when using real data.
Collapse
Affiliation(s)
- Edward R Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University; Computational Biology Division, Translational Genomics Research Institute; Department of Pathology, University of Texas M.D. Anderson Cancer Center, USA
| |
Collapse
|
21
|
Tamada Y, Imoto S, Araki H, Nagasaki M, Print C, Charnock-Jones DS, Miyano S. Estimating genome-wide gene networks using nonparametric Bayesian network models on massively parallel computers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:683-697. [PMID: 20714027 DOI: 10.1109/tcbb.2010.68] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
We present a novel algorithm to estimate genome-wide gene networks consisting of more than 20,000 genes from gene expression data using nonparametric Bayesian networks. Due to the difficulty of learning Bayesian network structures, existing algorithms cannot be applied to more than a few thousand genes. Our algorithm overcomes this limitation by repeatedly estimating subnetworks in parallel for genes selected by neighbor node sampling. Through numerical simulation, we confirmed that our algorithm outperformed a heuristic algorithm in a shorter time. We applied our algorithm to microarray data from human umbilical vein endothelial cells (HUVECs) treated with siRNAs, to construct a human genome-wide gene network, which we compared to a small gene network estimated for the genes extracted using a traditional bioinformatics method. The results showed that our genome-wide gene network contains many features of the small network, as well as others that could not be captured during the small network estimation. The results also revealed master-regulator genes that are not in the small network but that control many of the genes in the small network. These analyses were impossible to realize without our proposed algorithm.
Collapse
Affiliation(s)
- Yoshinori Tamada
- Human Genome Center, Institute of Medical Science, The University of Tokyo, Laboratory of DNA Information Analysis, General Research Building 8F, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan.
| | | | | | | | | | | | | |
Collapse
|
22
|
Oh JH, Craft J, Al-Lozi R, Vaidya M, Meng Y, Deasy JO, Bradley JD, Naqa IE. A Bayesian network approach for modeling local failure in lung cancer. Phys Med Biol 2011; 56:1635-51. [PMID: 21335651 PMCID: PMC4646092 DOI: 10.1088/0031-9155/56/6/008] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Locally advanced non-small cell lung cancer (NSCLC) patients suffer from a high local failure rate following radiotherapy. Despite many efforts to develop new dose-volume models for early detection of tumor local failure, there was no reported significant improvement in their application prospectively. Based on recent studies of biomarker proteins' role in hypoxia and inflammation in predicting tumor response to radiotherapy, we hypothesize that combining physical and biological factors with a suitable framework could improve the overall prediction. To test this hypothesis, we propose a graphical Bayesian network framework for predicting local failure in lung cancer. The proposed approach was tested using two different datasets of locally advanced NSCLC patients treated with radiotherapy. The first dataset was collected retrospectively, which comprises clinical and dosimetric variables only. The second dataset was collected prospectively in which in addition to clinical and dosimetric information, blood was drawn from the patients at various time points to extract candidate biomarkers as well. Our preliminary results show that the proposed method can be used as an efficient method to develop predictive models of local failure in these patients and to interpret relationships among the different variables in the models. We also demonstrate the potential use of heterogeneous physical and biological variables to improve the model prediction. With the first dataset, we achieved better performance compared with competing Bayesian-based classifiers. With the second dataset, the combined model had a slightly higher performance compared to individual physical and biological models, with the biological variables making the largest contribution. Our preliminary results highlight the potential of the proposed integrated approach for predicting post-radiotherapy local failure in NSCLC patients.
Collapse
Affiliation(s)
- Jung Hun Oh
- Department of Radiation Oncology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, MO 63110, USA
| | - Jeffrey Craft
- Department of Radiation Oncology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, MO 63110, USA
| | - Rawan Al-Lozi
- Department of Radiation Oncology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, MO 63110, USA
| | - Manushka Vaidya
- Department of Radiation Oncology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, MO 63110, USA
| | - Yifan Meng
- Department of Radiation Oncology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, MO 63110, USA
| | - Joseph O Deasy
- Department of Radiation Oncology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, MO 63110, USA
| | - Jeffrey D Bradley
- Department of Radiation Oncology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, MO 63110, USA
| | - Issam El Naqa
- Department of Radiation Oncology, Mallinckrodt Institute of Radiology, Washington University School of Medicine, MO 63110, USA
| |
Collapse
|
23
|
Dougherty ER. Validation of gene regulatory networks: scientific and inferential. Brief Bioinform 2010; 12:245-52. [PMID: 21183477 DOI: 10.1093/bib/bbq078] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Gene regulatory network models are a major area of study in systems and computational biology and the construction of network models is among the most important problems in these disciplines. The critical epistemological issue concerns validation. Validity can be approached from two different perspectives (i) given a hypothesized network model, its scientific validity relates to the ability to make predictions from the model that can be checked against experimental observations; and (ii) the validity of a network inference procedure must be evaluated relative to its ability to infer a network from sample points generated by the network. This article examines both perspectives in the framework of a distance function between two networks. It considers some of the obstacles to validation and provides examples of both validation paradigms.
Collapse
Affiliation(s)
- Edward R Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, USA.
| |
Collapse
|
24
|
ModuleNet: An R package on regulatory network building. CHINESE SCIENCE BULLETIN-CHINESE 2010. [DOI: 10.1007/s11434-010-3278-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
25
|
Kim DC, Wang X, Yang CR, Gao J. Learning biological network using mutual information and conditional independence. BMC Bioinformatics 2010; 11 Suppl 3:S9. [PMID: 20438656 PMCID: PMC2863068 DOI: 10.1186/1471-2105-11-s3-s9] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological networks offer us a new way to investigate the interactions among different components and address the biological system as a whole. In this paper, a reverse-phase protein microarray (RPPM) is used for the quantitative measurement of proteomic responses. RESULTS To discover the signaling pathway responsive to RPPM, a new structure learning algorithm of Bayesian networks is developed based on mutual Information, conditional independence, and graph immorality. Trusted biology networks are thus predicted by the new approach. As an application example, we investigate signaling networks of ataxia telangiectasis mutation (ATM). The study was carried out at different time points under different dosages for cell lines with and without gene transfection. To validate the performance of the proposed algorithm, comparison experiments were also implemented using three well-known networks. From the experiment results, our approach produces more reliable networks with a relatively small number of wrong connection especially in mid-size networks. By using the proposed method, we predicted different networks for ATM under different doses of radiation treatment, and those networks were compared with results from eight different protein protein interaction (PPI) databases. CONCLUSIONS By using a new protein microarray technology in combination with a new computational framework, we demonstrate an application of the methodology to the study of biological networks of ATM cell lines under low dose ionization radiation.
Collapse
Affiliation(s)
- Dong-Chul Kim
- Department of Computer Science and Engineering, The University of Texas at Arlington, 76019, USA
| | | | | | | |
Collapse
|
26
|
Maraziotis IA, Dragomir A, Thanos D. Gene regulatory networks modelling using a dynamic evolutionary hybrid. BMC Bioinformatics 2010; 11:140. [PMID: 20298548 PMCID: PMC2848237 DOI: 10.1186/1471-2105-11-140] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 03/18/2010] [Indexed: 11/16/2022] Open
Abstract
Background Inference of gene regulatory networks is a key goal in the quest for understanding fundamental cellular processes and revealing underlying relations among genes. With the availability of gene expression data, computational methods aiming at regulatory networks reconstruction are facing challenges posed by the data's high dimensionality, temporal dynamics or measurement noise. We propose an approach based on a novel multi-layer evolutionary trained neuro-fuzzy recurrent network (ENFRN) that is able to select potential regulators of target genes and describe their regulation type. Results The recurrent, self-organizing structure and evolutionary training of our network yield an optimized pool of regulatory relations, while its fuzzy nature avoids noise-related problems. Furthermore, we are able to assign scores for each regulation, highlighting the confidence in the retrieved relations. The approach was tested by applying it to several benchmark datasets of yeast, managing to acquire biologically validated relations among genes. Conclusions The results demonstrate the effectiveness of the ENFRN in retrieving biologically valid regulatory relations and providing meaningful insights for better understanding the dynamics of gene regulatory networks. The algorithms and methods described in this paper have been implemented in a Matlab toolbox and are available from: http://bioserver-1.bioacademy.gr/DataRepository/Project_ENFRN_GRN/.
Collapse
Affiliation(s)
- Ioannis A Maraziotis
- Institute of Molecular Biology, Genetics and Biotechnology, Biomedical Research Foundation, Academy of Athens, 4 Soranou Efesiou Street, Athens 11527, Greece
| | | | | |
Collapse
|
27
|
Sun X, Hong P. Automatic inference of multicellular regulatory networks using informative priors. INTERNATIONAL JOURNAL OF COMPUTATIONAL BIOLOGY AND DRUG DESIGN 2010; 2:115-33. [PMID: 20090166 DOI: 10.1504/ijcbdd.2009.028820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
To fully understand the mechanisms governing animal development, computational models and algorithms are needed to enable quantitative studies of the underlying regulatory networks. We developed a mathematical model based on dynamic Bayesian networks to model multicellular regulatory networks that govern cell differentiation processes. A machine-learning method was developed to automatically infer such a model from heterogeneous data. We show that the model inference procedure can be greatly improved by incorporating interaction data across species. The proposed approach was applied to C. elegans vulval induction to reconstruct a model capable of simulating C. elegans vulval induction under 73 different genetic conditions.
Collapse
Affiliation(s)
- Xiaoyun Sun
- Department of Computer Science, Brandeis University, Waltham, MA 02454, USA.
| | | |
Collapse
|
28
|
Yousef M, Ketany M, Manevitz L, Showe LC, Showe MK. Classification and biomarker identification using gene network modules and support vector machines. BMC Bioinformatics 2009; 10:337. [PMID: 19832995 PMCID: PMC2774324 DOI: 10.1186/1471-2105-10-337] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2009] [Accepted: 10/15/2009] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE) RESULTS: Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form n clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained. CONCLUSION More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays.The Matlab version of SVM-RNE can be downloaded from http://web.macam.ac.il/~myousef.
Collapse
Affiliation(s)
- Malik Yousef
- The Institute of Applied Research - The Galilee Society, Shefa-Amr, Israel
- Al-Qasemi Academic College, Baqa Algharbiya, Israel
| | - Mohamed Ketany
- Computer Science Department, University of Haifa, Haifa, Israel
| | - Larry Manevitz
- Computer Science Department, University of Haifa, Haifa, Israel
| | - Louise C Showe
- Molecular Oncogenesis/Systems Biology Program, The Wistar Institute, Philadelphia, PA 19104, USA
| | - Michael K Showe
- Molecular Oncogenesis/Systems Biology Program, The Wistar Institute, Philadelphia, PA 19104, USA
| |
Collapse
|
29
|
Neumann J, Fox PT, Turner R, Lohmann G. Learning partially directed functional networks from meta-analysis imaging data. Neuroimage 2009; 49:1372-84. [PMID: 19815079 DOI: 10.1016/j.neuroimage.2009.09.056] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2008] [Revised: 09/18/2009] [Accepted: 09/24/2009] [Indexed: 11/17/2022] Open
Abstract
We propose a new exploratory method for the discovery of partially directed functional networks from fMRI meta-analysis data. The method performs structure learning of Bayesian networks in search of directed probabilistic dependencies between brain regions. Learning is based on the co-activation of brain regions observed across several independent imaging experiments. In a series of simulations, we first demonstrate the reliability of the method. We then present the application of our approach in an extensive meta-analysis including several thousand activation coordinates from more than 500 imaging studies. Results show that our method is able to automatically infer Bayesian networks that capture both directed and undirected probabilistic dependencies between a number of brain regions, including regions that are frequently observed in motor-related and cognitive control tasks.
Collapse
Affiliation(s)
- Jane Neumann
- Department of Neurophysics, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstrasse 1a, D-04103, Leipzig, Germany.
| | | | | | | |
Collapse
|
30
|
Integrating multiple microarray data for cancer pathway analysis using bootstrapping K-S test. J Biomed Biotechnol 2009; 2009:707580. [PMID: 19704919 PMCID: PMC2688657 DOI: 10.1155/2009/707580] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2009] [Accepted: 03/04/2009] [Indexed: 11/17/2022] Open
Abstract
Previous applications of microarray technology for cancer research have mostly focused on identifying genes that are differentially expressed between a particular cancer and normal cells. In a biological system, genes perform different molecular functions and regulate various biological processes via interactions with other genes thus forming a variety of complex networks. Therefore, it is critical to understand the relationship (e.g., interactions) between genes across different types of cancer in order to gain insights into the molecular mechanisms of cancer. Here we propose an integrative method based on the bootstrapping Kolmogorov-Smirnov test and a large set of microarray data produced with various types of cancer to discover common molecular changes in cells from normal state to cancerous state. We evaluate our method using three key pathways related to cancer and demonstrate that it is capable of finding meaningful alterations in gene relations.
Collapse
|
31
|
Lin X, Liu M, Chen XW. Assessing reliability of protein-protein interactions by integrative analysis of data in model organisms. BMC Bioinformatics 2009; 10 Suppl 4:S5. [PMID: 19426453 PMCID: PMC2681066 DOI: 10.1186/1471-2105-10-s4-s5] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Protein-protein interactions play vital roles in nearly all cellular processes and are involved in the construction of biological pathways such as metabolic and signal transduction pathways. Although large-scale experiments have enabled the discovery of thousands of previously unknown linkages among proteins in many organisms, the high-throughput interaction data is often associated with high error rates. Since protein interaction networks have been utilized in numerous biological inferences, the inclusive experimental errors inevitably affect the quality of such prediction. Thus, it is essential to assess the quality of the protein interaction data. Results In this paper, a novel Bayesian network-based integrative framework is proposed to assess the reliability of protein-protein interactions. We develop a cross-species in silico model that assigns likelihood scores to individual protein pairs based on the information entirely extracted from model organisms. Our proposed approach integrates multiple microarray datasets and novel features derived from gene ontology. Furthermore, the confidence scores for cross-species protein mappings are explicitly incorporated into our model. Applying our model to predict protein interactions in the human genome, we are able to achieve 80% in sensitivity and 70% in specificity. Finally, we assess the overall quality of the experimentally determined yeast protein-protein interaction dataset. We observe that the more high-throughput experiments confirming an interaction, the higher the likelihood score, which confirms the effectiveness of our approach. Conclusion This study demonstrates that model organisms certainly provide important information for protein-protein interaction inference and assessment. The proposed method is able to assess not only the overall quality of an interaction dataset, but also the quality of individual protein-protein interactions. We expect the method to continually improve as more high quality interaction data from more model organisms becomes available and is readily scalable to a genome-wide application.
Collapse
Affiliation(s)
- Xiaotong Lin
- Bioinformatics and Computational Life-Science Laboratory, ITTC, Department of Electrical Engineering and Computer Science, The University of Kansas, 1520 west 15th Street, Lawrence, KS 66045, USA.
| | | | | |
Collapse
|
32
|
Abstract
UNLABELLED Attaining a detailed understanding of the various biological networks in an organism lies at the core of the emerging discipline of systems biology. A precise description of the relationships formed between genes, mRNA molecules, and proteins is a necessary step toward a complete description of the dynamic behavior of an organism at the cellular level, and toward intelligent, efficient, and directed modification of an organism. The importance of understanding such regulatory, signaling, and interaction networks has fueled the development of numerous in silico inference algorithms, as well as new experimental techniques and a growing collection of public databases. The Software Environment for BIological Network Inference (SEBINI) has been created to provide an interactive environment for the deployment, evaluation, and improvement of algorithms used to reconstruct the structure of biological regulatory and interaction networks. SEBINI can be used to analyze high-throughput gene expression, protein abundance, or protein activation data via a suite of state-of-the-art network inference algorithms. It also allows algorithm developers to compare and train network inference methods on artificial networks and simulated gene expression perturbation data. SEBINI can therefore be used by software developers wishing to evaluate, refine, or combine inference techniques, as well as by bioinformaticians analyzing experimental data. Networks inferred from the SEBINI software platform can be further analyzed using the Collective Analysis of Biological Interaction Networks (CABIN) tool, which is an exploratory data analysis software that enables integration and analysis of protein-protein interaction and gene-to-gene regulatory evidence obtained from multiple sources. The collection of edges in a public database, along with the confidence held in each edge (if available), can be fed into CABIN as one "evidence network," using the Cytoscape SIF file format. Using CABIN, one may increase the confidence in individual edges in a network inferred by an algorithm in SEBINI, as well as extend such a network by combining it with species-specific or generic information, e.g., known protein-protein interactions or target genes identified for known transcription factors. Thus, the combined SEBINI-CABIN toolkit aids in the more accurate reconstruction of biological networks, with less effort, in less time.A demonstration web site for SEBINI can be accessed from https://www.emsl.pnl.gov/SEBINI/RootServlet . Source code and PostgreSQL database schema are available under open source license. CONTACT ronald.taylor@pnl.gov. For commercial use, some algorithms included in SEBINI require licensing from the original developers. CABIN can be downloaded from http://www.sysbio.org/dataresources/cabin.stm . CONTACT mudita.singhal@pnl.gov.
Collapse
Affiliation(s)
- Ronald Taylor
- Computational Biology and Bioinformatics Group, Computational and Informational Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA.
| | | |
Collapse
|
33
|
Kim J, Bates DG, Postlethwaite I, Heslop-Harrison P, Cho KH. Linear time-varying models can reveal non-linear interactions of biomolecular regulatory networks using multiple time-series data. Bioinformatics 2008; 24:1286-92. [PMID: 18367478 DOI: 10.1093/bioinformatics/btn107] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024] Open
Abstract
MOTIVATION Inherent non-linearities in biomolecular interactions make the identification of network interactions difficult. One of the principal problems is that all methods based on the use of linear time-invariant models will have fundamental limitations in their capability to infer certain non-linear network interactions. Another difficulty is the multiplicity of possible solutions, since, for a given dataset, there may be many different possible networks which generate the same time-series expression profiles. RESULTS A novel algorithm for the inference of biomolecular interaction networks from temporal expression data is presented. Linear time-varying models, which can represent a much wider class of time-series data than linear time-invariant models, are employed in the algorithm. From time-series expression profiles, the model parameters are identified by solving a non-linear optimization problem. In order to systematically reduce the set of possible solutions for the optimization problem, a filtering process is performed using a phase-portrait analysis with random numerical perturbations. The proposed approach has the advantages of not requiring the system to be in a stable steady state, of using time-series profiles which have been generated by a single experiment, and of allowing non-linear network interactions to be identified. The ability of the proposed algorithm to correctly infer network interactions is illustrated by its application to three examples: a non-linear model for cAMP oscillations in Dictyostelium discoideum, the cell-cycle data for Saccharomyces cerevisiae and a large-scale non-linear model of a group of synchronized Dictyostelium cells. AVAILABILITY The software used in this article is available from http://sbie.kaist.ac.kr/software
Collapse
Affiliation(s)
- Jongrae Kim
- Department of Aerospace Engineering, University of Glasgow, Glasgow, UK
| | | | | | | | | |
Collapse
|
34
|
Kim JM, Jung YS, Sungur EA, Han KH, Park C, Sohn I. A copula method for modeling directional dependence of genes. BMC Bioinformatics 2008; 9:225. [PMID: 18447957 PMCID: PMC2386493 DOI: 10.1186/1471-2105-9-225] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2007] [Accepted: 05/01/2008] [Indexed: 11/22/2022] Open
Abstract
Background Genes interact with each other as basic building blocks of life, forming a complicated network. The relationship between groups of genes with different functions can be represented as gene networks. With the deposition of huge microarray data sets in public domains, study on gene networking is now possible. In recent years, there has been an increasing interest in the reconstruction of gene networks from gene expression data. Recent work includes linear models, Boolean network models, and Bayesian networks. Among them, Bayesian networks seem to be the most effective in constructing gene networks. A major problem with the Bayesian network approach is the excessive computational time. This problem is due to the interactive feature of the method that requires large search space. Since fitting a model by using the copulas does not require iterations, elicitation of the priors, and complicated calculations of posterior distributions, the need for reference to extensive search spaces can be eliminated leading to manageable computational affords. Bayesian network approach produces a discretely expression of conditional probabilities. Discreteness of the characteristics is not required in the copula approach which involves use of uniform representation of the continuous random variables. Our method is able to overcome the limitation of Bayesian network method for gene-gene interaction, i.e. information loss due to binary transformation. Results We analyzed the gene interactions for two gene data sets (one group is eight histone genes and the other group is 19 genes which include DNA polymerases, DNA helicase, type B cyclin genes, DNA primases, radiation sensitive genes, repaire related genes, replication protein A encoding gene, DNA replication initiation factor, securin gene, nucleosome assembly factor, and a subunit of the cohesin complex) by adopting a measure of directional dependence based on a copula function. We have compared our results with those from other methods in the literature. Although microarray results show a transcriptional co-regulation pattern and do not imply that the gene products are physically interactive, this tight genetic connection may suggest that each gene product has either direct or indirect connections between the other gene products. Indeed, recent comprehensive analysis of a protein interaction map revealed that those histone genes are physically connected with each other, supporting the results obtained by our method. Conclusion The results illustrate that our method can be an alternative to Bayesian networks in modeling gene interactions. One advantage of our approach is that dependence between genes is not assumed to be linear. Another advantage is that our approach can detect directional dependence. We expect that our study may help to design artificial drug candidates, which can block or activate biologically meaningful pathways. Moreover, our copula approach can be extended to investigate the effects of local environments on protein-protein interactions. The copula mutual information approach will help to propose the new variant of ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks): an algorithm for the reconstruction of gene regulatory networks.
Collapse
Affiliation(s)
- Jong-Min Kim
- Division of Science and Mathematics, University of Minnesota, Morris, MN, 56267, USA.
| | | | | | | | | | | |
Collapse
|
35
|
Abstract
Non-independent evolution of amino acid sites has become a noticeable limitation of most methods aimed at identifying selective constraints at functionally important amino acid sites or protein regions. The need for a generalised framework to account for non-independence of amino acid sites has fuelled the design and development of new mathematical models and computational tools centred on resolving this problem. Molecular coevolution is one of the most active areas of research, with an increasing rate of new models and methods being developed everyday. Both parametric and non-parametric methods have been developed to account for correlated variability of amino acid sites. These methods have been utilised for detecting phylogenetic, functional and structural coevolution as well as to identify surfaces of amino acid sites involved in protein-protein interactions. Here we discuss and briefly describe these methods, and identify their advantages and limitations.
Collapse
Affiliation(s)
- Francisco M. Codoñer
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College
- Institute of Immunology, Biology Department, National University of Ireland Maynooth
| | - Mario A. Fares
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College
| |
Collapse
|
36
|
Abstract
Common human diseases like obesity and diabetes are driven by complex networks of genes and any number of environmental factors. To understand this complexity in hopes of identifying targets and developing drugs against disease, a systematic approach is required to elucidate the genetic and environmental factors and interactions among and between these factors, and to establish how these factors induce changes in gene networks that in turn lead to disease. The explosion of large-scale, high-throughput technologies in the biological sciences has enabled researchers to take a more systems biology approach to study complex traits like disease. Genotyping of hundreds of thousands of DNA markers and profiling tens of thousands of molecular phenotypes simultaneously in thousands of individuals is now possible, and this scale of data is making it possible for the first time to reconstruct whole gene networks associated with disease. In the following sections, we review different approaches for integrating genetic expression and clinical data to infer causal relationships among gene expression traits and between expression and disease traits. We further review methods to integrate these data in a more comprehensive manner to identify common pathways shared by the causal factors driving disease, including the reconstruction of association and probabilistic causal networks. Particular attention is paid to integrating diverse information to refine these types of networks so that they are more predictive. To highlight these different approaches in practice, we step through an example on how Insig2 was identified as a causal factor for plasma cholesterol levels in mice.
Collapse
|
37
|
Wang M, Chen Z, Cloutier S. A hybrid Bayesian network learning method for constructing gene networks. Comput Biol Chem 2007; 31:361-72. [PMID: 17889617 DOI: 10.1016/j.compbiolchem.2007.08.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2007] [Accepted: 08/13/2007] [Indexed: 11/20/2022]
Abstract
A Bayesian network (BN) is a knowledge representation formalism that has proven to be a promising tool for analyzing gene expression data. Several problems still restrict its successful applications. Typical gene expression databases contain measurements for thousands of genes and no more than several hundred samples, but most existing BNs learning algorithms do not scale more than a few hundred variables. Current methods result in poor quality BNs when applied in such high-dimensional datasets. We propose a hybrid constraint-based scored-searching method that is effective for learning gene networks from DNA microarray data. In the first phase of this method, a novel algorithm is used to generate a skeleton BN based on dependency analysis. Then the resulting BN structure is searched by a scoring metric combined with the knowledge learned from the first phase. Computational tests have shown that the proposed method achieves more accurate results than state-of-the-art methods. This method can also be scaled beyond datasets with several hundreds of variables.
Collapse
Affiliation(s)
- Mingyi Wang
- Agriculture and Agri-Food Canada, Cereal Research Centre, Winnipeg, MB R3T 2M9, Canada
| | | | | |
Collapse
|
38
|
Han S, Yoon Y, Cho KH. Inferring biomolecular interaction networks based on convex optimization. Comput Biol Chem 2007; 31:347-54. [PMID: 17890159 DOI: 10.1016/j.compbiolchem.2007.08.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2007] [Revised: 07/20/2007] [Accepted: 08/10/2007] [Indexed: 11/28/2022]
Abstract
We present an optimization-based inference scheme to unravel the functional interaction structure of biomolecular components within a cell. The regulatory network of a cell is inferred from the data obtained by perturbation of adjustable parameters or initial concentrations of specific components. It turns out that the identification procedure leads to a convex optimization problem with regularization as we have to achieve the sparsity of a network and also reflect any a priori information on the network structure. Since the convex optimization has been well studied for a long time, a variety of efficient algorithms were developed and many numerical solvers are freely available. In order to estimate time derivatives from discrete-time samples, a cubic spline fitting is incorporated into the proposed optimization procedure. Throughout simulation studies on several examples, it is shown that the proposed convex optimization scheme can effectively uncover the functional interaction structure of a biomolecular regulatory network with reasonable accuracy.
Collapse
Affiliation(s)
- Soohee Han
- Bio-MAX Institute, Seoul National University, Seoul 151-818, Republic of Korea
| | | | | |
Collapse
|