1
|
Kizhakkethil Youseph AS, Chetty M, Karmakar G. Reverse engineering genetic networks using nonlinear saturation kinetics. Biosystems 2019; 182:30-41. [PMID: 31185246 DOI: 10.1016/j.biosystems.2019.103977] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/25/2019] [Accepted: 05/27/2019] [Indexed: 01/01/2023]
Abstract
A gene regulatory network (GRN) represents a set of genes along with their regulatory interactions. Cellular behavior is driven by genetic level interactions. Dynamics of such systems show nonlinear saturation kinetics which can be best modeled by Michaelis-Menten (MM) and Hill equations. Although MM equation is being widely used for modeling biochemical processes, it has been applied rarely for reverse engineering GRNs. In this paper, we develop a complete framework for a novel model for GRN inference using MM kinetics. A set of coupled equations is first proposed for modeling GRNs. In the coupled model, Michaelis-Menten constant associated with regulation by a gene is made invariant irrespective of the gene being regulated. The parameter estimation of the proposed model is carried out using an evolutionary optimization method, namely, trigonometric differential evolution (TDE). Subsequently, the model is further improved and the regulations of different genes by a given gene are made distinct by allowing varying values of Michaelis-Menten constants for each regulation. Apart from making the model more relevant biologically, the improvement results in a decoupled GRN model with fast estimation of model parameters. Further, to enhance exploitation of the search, we propose a local search algorithm based on hill climbing heuristics. A novel mutation operation is also proposed to avoid population stagnation and premature convergence. Real life benchmark data sets generated in vivo are used for validating the proposed model. Further, we also analyze realistic in silico datasets generated using GeneNetweaver. The comparison of the performance of proposed model with other existing methods shows the potential of the proposed model.
Collapse
Affiliation(s)
| | - Madhu Chetty
- School of Science, Engineering and Information Technology, Federation University Australia, Gippsland 3842, Australia
| | - Gour Karmakar
- School of Science, Engineering and Information Technology, Federation University Australia, Gippsland 3842, Australia
| |
Collapse
|
2
|
Evaluation of artificial time series microarray data for dynamic gene regulatory network inference. J Theor Biol 2017; 426:1-16. [PMID: 28528256 DOI: 10.1016/j.jtbi.2017.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 03/13/2017] [Accepted: 05/05/2017] [Indexed: 11/21/2022]
Abstract
High-throughput technology like microarrays is widely used in the inference of gene regulatory networks (GRNs). We focused on time series data since we are interested in the dynamics of GRNs and the identification of dynamic networks. We evaluated the amount of information that exists in artificial time series microarray data and the ability of an inference process to produce accurate models based on them. We used dynamic artificial gene regulatory networks in order to create artificial microarray data. Key features that characterize microarray data such as the time separation of directly triggered genes, the percentage of directly triggered genes and the triggering function type were altered in order to reveal the limits that are imposed by the nature of microarray data on the inference process. We examined the effect of various factors on the inference performance such as the network size, the presence of noise in microarray data, and the network sparseness. We used a system theory approach and examined the relationship between the pole placement of the inferred system and the inference performance. We examined the relationship between the inference performance in the time domain and the true system parameter identification. Simulation results indicated that time separation and the percentage of directly triggered genes are crucial factors. Also, network sparseness, the triggering function type and noise in input data affect the inference performance. When two factors were simultaneously varied, it was found that variation of one parameter significantly affects the dynamic response of the other. Crucial factors were also examined using a real GRN and acquired results confirmed simulation findings with artificial data. Different initial conditions were also used as an alternative triggering approach. Relevant results confirmed that the number of datasets constitutes the most significant parameter with regard to the inference performance.
Collapse
|
3
|
Gui S, Rice AP, Chen R, Wu L, Liu J, Miao H. A scalable algorithm for structure identification of complex gene regulatory network from temporal expression data. BMC Bioinformatics 2017; 18:74. [PMID: 28143596 PMCID: PMC5294888 DOI: 10.1186/s12859-017-1489-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2016] [Accepted: 01/20/2017] [Indexed: 12/31/2022] Open
Abstract
Background Gene regulatory interactions are of fundamental importance to various biological functions and processes. However, only a few previous computational studies have claimed success in revealing genome-wide regulatory landscapes from temporal gene expression data, especially for complex eukaryotes like human. Moreover, recent work suggests that these methods still suffer from the curse of dimensionality if a network size increases to 100 or higher. Results Here we present a novel scalable algorithm for identifying genome-wide gene regulatory network (GRN) structures, and we have verified the algorithm performances by extensive simulation studies based on the DREAM challenge benchmark data. The highlight of our method is that its superior performance does not degenerate even for a network size on the order of 104, and is thus readily applicable to large-scale complex networks. Such a breakthrough is achieved by considering both prior biological knowledge and multiple topological properties (i.e., sparsity and hub gene structure) of complex networks in the regularized formulation. We also validate and illustrate the application of our algorithm in practice using the time-course gene expression data from a study on human respiratory epithelial cells in response to influenza A virus (IAV) infection, as well as the CHIP-seq data from ENCODE on transcription factor (TF) and target gene interactions. An interesting finding, owing to the proposed algorithm, is that the biggest hub structures (e.g., top ten) in the GRN all center at some transcription factors in the context of epithelial cell infection by IAV. Conclusions The proposed algorithm is the first scalable method for large complex network structure identification. The GRN structure identified by our algorithm could reveal possible biological links and help researchers to choose which gene functions to investigate in a biological event. The algorithm described in this article is implemented in MATLAB Ⓡ, and the source code is freely available from https://github.com/Hongyu-Miao/DMI.git. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1489-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shupeng Gui
- Department of Computer Science, University of Rochester, Rochester, 14620, NY, USA
| | - Andrew P Rice
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, 77030, TX, USA
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, 77030, TX, USA
| | - Liang Wu
- Department of Biostatistics, University of Texas Health Science Center, Houston, 77030, TX, USA
| | - Ji Liu
- Department of Computer Science, University of Rochester, Rochester, 14620, NY, USA.,Goergen Institute for Data Science, University of Rochester, Rochester, 14620, NY, USA
| | - Hongyu Miao
- Department of Biostatistics, University of Texas Health Science Center, Houston, 77030, TX, USA.
| |
Collapse
|
4
|
Ogundijo OE, Elmas A, Wang X. Reverse engineering gene regulatory networks from measurement with missing values. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2017; 2017:2. [PMID: 28127303 PMCID: PMC5225239 DOI: 10.1186/s13637-016-0055-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/15/2016] [Indexed: 12/31/2022]
Abstract
Background Gene expression time series data are usually in the form of high-dimensional
arrays. Unfortunately, the data may sometimes contain missing values: for either
the expression values of some genes at some time points or the entire expression
values of a single time point or some sets of consecutive time points. This
significantly affects the performance of many algorithms for gene expression
analysis that take as an input, the complete matrix of gene expression
measurement. For instance, previous works have shown that gene regulatory
interactions can be estimated from the complete matrix of gene expression
measurement. Yet, till date, few algorithms have been proposed for the inference
of gene regulatory network from gene expression data with missing values. Results We describe a nonlinear dynamic stochastic model for the evolution of gene
expression. The model captures the structural, dynamical, and the nonlinear
natures of the underlying biomolecular systems. We present point-based Gaussian
approximation (PBGA) filters for joint state and parameter estimation of the
system with one-step or two-step missing measurements. The PBGA filters use Gaussian
approximation and various quadrature rules, such as the unscented transform (UT),
the third-degree cubature rule and the central difference rule for computing the
related posteriors. The proposed algorithm is evaluated with satisfying results
for synthetic networks, in silico networks released as a part of the DREAM
project, and the real biological network, the in vivo reverse engineering and
modeling assessment (IRMA) network of yeast Saccharomyces
cerevisiae. Conclusion PBGA filters are proposed to elucidate the underlying gene regulatory network
(GRN) from time series gene expression data that contain missing values. In our
state-space model, we proposed a measurement model that incorporates the effect of
the missing data points into the sequential algorithm. This approach produces a
better inference of the model parameters and hence, more accurate prediction of
the underlying GRN compared to when using the conventional Gaussian approximation
(GA) filters ignoring the missing data points. Electronic supplementary material The online version of this article (doi:10.1186/s13637-016-0055-8) contains supplementary material, which is available to authorized
users.
Collapse
Affiliation(s)
- Oyetunji E Ogundijo
- Department of Electrical Engineering, Columbia University, 500 W 120th Street, New York, 10027 NY USA
| | - Abdulkadir Elmas
- Department of Electrical Engineering, Columbia University, 500 W 120th Street, New York, 10027 NY USA
| | - Xiaodong Wang
- Department of Electrical Engineering, Columbia University, 500 W 120th Street, New York, 10027 NY USA
| |
Collapse
|
5
|
Use of systems biology to decipher host-pathogen interaction networks and predict biomarkers. Clin Microbiol Infect 2016; 22:600-6. [PMID: 27113568 DOI: 10.1016/j.cmi.2016.04.014] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Revised: 04/13/2016] [Accepted: 04/15/2016] [Indexed: 02/06/2023]
Abstract
In systems biology, researchers aim to understand complex biological systems as a whole, which is often achieved by mathematical modelling and the analyses of high-throughput data. In this review, we give an overview of medical applications of systems biology approaches with special focus on host-pathogen interactions. After introducing general ideas of systems biology, we focus on (1) the detection of putative biomarkers for improved diagnosis and support of therapeutic decisions, (2) network modelling for the identification of regulatory interactions between cellular molecules to reveal putative drug targets and (3) module discovery for the detection of phenotype-specific modules in molecular interaction networks. Biomarker detection applies supervised machine learning methods utilizing high-throughput data (e.g. single nucleotide polymorphism (SNP) detection, RNA-seq, proteomics) and clinical data. We demonstrate structural analysis of molecular networks, especially by identification of disease modules as a novel strategy, and discuss possible applications to host-pathogen interactions. Pioneering work was done to predict molecular host-pathogen interactions networks based on dual RNA-seq data. However, currently this network modelling is restricted to a small number of genes. With increasing number and quality of databases and data repositories, the prediction of large-scale networks will also be feasible that can used for multidimensional diagnosis and decision support for prevention and therapy of diseases. Finally, we outline further perspective issues such as support of personalized medicine with high-throughput data and generation of multiscale host-pathogen interaction models.
Collapse
|
6
|
Wang CCN, Sheu PCY, Tsai JJP. Towards Semantic Biomedical Problem Solving. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2015. [DOI: 10.1142/s1793351x15500075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Biological and medical intelligence (BMI) has been studied in solos, lacking a systematic methodology. In this paper, we describe how Semantic Computing can enhance biological and medical intelligence. Specifically, we show how Structured Natural Language (SNL) can express many problems in BMI with a finite number of sentence patterns, and show how biological tools, OLAP, data mining tools and statistical analysis tools may be linked to solve problems related to biomedical data.
Collapse
Affiliation(s)
- Charles C. N. Wang
- Department of Biomedical Informatics, Asia University, 500, Lioufeng Rd., Wufeng, Taichung 41354, Taiwan
| | - Phillip C.-Y. Sheu
- Department of Electrical Engineering and Computer Science, University of California – Irvine, 5200 Engineering Hall, Irvine, CA 92697, USA
| | - Jeffrey J. P. Tsai
- Department of Biomedical Informatics, Asia University, 500, Lioufeng Rd., Wufeng, Taichung 41354, Taiwan
| |
Collapse
|
7
|
Liu LZ, Wu FX, Zhang WJ. Properties of sparse penalties on inferring gene regulatory networks from time-course gene expression data. IET Syst Biol 2015; 9:16-24. [PMID: 25569860 DOI: 10.1049/iet-syb.2013.0060] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Genes regulate each other and form a gene regulatory network (GRN) to realise biological functions. Elucidating GRN from experimental data remains a challenging problem in systems biology. Numerous techniques have been developed and sparse linear regression methods become a promising approach to infer accurate GRNs. However, most linear methods are either based on steady-state gene expression data or their statistical properties are not analysed. Here, two sparse penalties, adaptive least absolute shrinkage and selection operator and smoothly clipped absolute deviation, are proposed to infer GRNs from time-course gene expression data based on an auto-regressive model and their Oracle properties are proved under mild conditions. The effectiveness of those methods is demonstrated by applications to in silico and real biological data.
Collapse
Affiliation(s)
- Li-Zhi Liu
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Fang-Xiang Wu
- Department of Mechanical Engineering, Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.
| | - Wen-Jun Zhang
- Department of Mechanical Engineering, Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
8
|
Akutekwe A, Seker H. Inference of nonlinear gene regulatory networks through optimized ensemble of support vector regression and dynamic Bayesian networks. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2015:8177-8180. [PMID: 26738192 DOI: 10.1109/embc.2015.7320292] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in systems biology. Most methods for modeling and inferring the dynamics of GRNs, such as those based on state space models, vector autoregressive models and G1DBN algorithm, assume linear dependencies among genes. However, this strong assumption does not make for true representation of time-course relationships across the genes, which are inherently nonlinear. Nonlinear modeling methods such as the S-systems and causal structure identification (CSI) have been proposed, but are known to be statistically inefficient and analytically intractable in high dimensions. To overcome these limitations, we propose an optimized ensemble approach based on support vector regression (SVR) and dynamic Bayesian networks (DBNs). The method called SVR-DBN, uses nonlinear kernels of the SVR to infer the temporal relationships among genes within the DBN framework. The two-stage ensemble is further improved by SVR parameter optimization using Particle Swarm Optimization. Results on eight insilico-generated datasets, and two real world datasets of Drosophila Melanogaster and Escherichia Coli, show that our method outperformed the G1DBN algorithm by a total average accuracy of 12%. We further applied our method to model the time-course relationships of ovarian carcinoma. From our results, four hub genes were discovered. Stratified analysis further showed that the expression levels Prostrate differentiation factor and BTG family member 2 genes, were significantly increased by the cisplatin and oxaliplatin platinum drugs; while expression levels of Polo-like kinase and Cyclin B1 genes, were both decreased by the platinum drugs. These hub genes might be potential biomarkers for ovarian carcinoma.
Collapse
|
9
|
Liu LZ, Wu FX, Zhang WJ. Properties of sparse penalties on inferring gene regulatory networks from time-course gene expression data. IET Syst Biol 2015. [PMID: 25569860 DOI: 10.1049/iet‐syb.2013.0060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Genes regulate each other and form a gene regulatory network (GRN) to realise biological functions. Elucidating GRN from experimental data remains a challenging problem in systems biology. Numerous techniques have been developed and sparse linear regression methods become a promising approach to infer accurate GRNs. However, most linear methods are either based on steady-state gene expression data or their statistical properties are not analysed. Here, two sparse penalties, adaptive least absolute shrinkage and selection operator and smoothly clipped absolute deviation, are proposed to infer GRNs from time-course gene expression data based on an auto-regressive model and their Oracle properties are proved under mild conditions. The effectiveness of those methods is demonstrated by applications to in silico and real biological data.
Collapse
Affiliation(s)
- Li-Zhi Liu
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Fang-Xiang Wu
- Department of Mechanical Engineering, Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.
| | - Wen-Jun Zhang
- Department of Mechanical Engineering, Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
10
|
Liu LZ, Wu FX, Zhang WJ. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 3:S1. [PMID: 25350697 PMCID: PMC4243122 DOI: 10.1186/1752-0509-8-s3-s1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. RESULTS A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. CONCLUSIONS The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.
Collapse
|
11
|
Abstract
Biochemical systems theory (BST) is the foundation for a set of analytical andmodeling tools that facilitate the analysis of dynamic biological systems. This paper depicts major developments in BST up to the current state of the art in 2012. It discusses its rationale, describes the typical strategies and methods of designing, diagnosing, analyzing, and utilizing BST models, and reviews areas of application. The paper is intended as a guide for investigators entering the fascinating field of biological systems analysis and as a resource for practitioners and experts.
Collapse
|