1
|
Yang G, Lei S, Yang G. Robust Model-Free Identification of the Causal Networks Underlying Complex Nonlinear Systems. ENTROPY (BASEL, SWITZERLAND) 2024; 26:1063. [PMID: 39766692 PMCID: PMC11675911 DOI: 10.3390/e26121063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 11/28/2024] [Accepted: 11/30/2024] [Indexed: 01/11/2025]
Abstract
Inferring causal networks from noisy observations is of vital importance in various fields. Due to the complexity of system modeling, the way in which universal and feasible inference algorithms are studied is a key challenge for network reconstruction. In this study, without any assumptions, we develop a novel model-free framework to uncover only the direct relationships in networked systems from observations of their nonlinear dynamics. Our proposed methods are termed multiple-order Polynomial Conditional Granger Causality (PCGC) and sparse PCGC (SPCGC). PCGC mainly adopts polynomial functions to approximate the whole system model, which can be used to judge the interactions among nodes through subsequent nonlinear Granger causality analysis. For SPCGC, Lasso optimization is first used for dimension reduction, and then PCGC is executed to obtain the final network. Specifically, the conditional variables are fused in this general, model-free framework regardless of their formulations in the system model, which could effectively reconcile the inference of direct interactions with an indirect influence. Based on many classical dynamical systems, the performances of PCGC and SPCGC are analyzed and verified. Generally, the proposed framework could be quite promising for the provision of certain guidance for data-driven modeling with an unknown model.
Collapse
Affiliation(s)
- Guanxue Yang
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Shimin Lei
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Guanxiao Yang
- College of Automation, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
| |
Collapse
|
2
|
Nabuco Leva Ferreira de Freitas JA, Bischof O. Dynamic modeling of the cellular senescence gene regulatory network. Heliyon 2023; 9:e14007. [PMID: 36938415 PMCID: PMC10015196 DOI: 10.1016/j.heliyon.2023.e14007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 02/13/2023] [Accepted: 02/17/2023] [Indexed: 02/27/2023] Open
Abstract
Cellular senescence is a cell fate that prominently impacts physiological and pathophysiological processes. Diverse cellular stresses induce it, and dramatic gene expression changes accompany it. However, determining the interactions comprising the gene regulatory network (GRN) governing senescence remains challenging. Recent advances in signal processing techniques provide opportunities to reconstruct GRNs. Here, we describe a GRN for senescence integrating time-series transcriptome and transcription factor depletion datasets. Specifically, we infer a set of differential equations using the "Sparse Identification of Nonlinear Dynamics" (SINDy) algorithm, discriminate genes with potential hidden regulators, validate the inferred GRN for time-points not included in the training data, and comprehensively benchmark our approach. Our work is a proof of concept for a data-driven GRN reconstruction method, consolidating an iterative, powerful mathematical platform for senescence modeling that can be used to test hypotheses in silico and has the potential for future discoveries of clinical impact.
Collapse
Affiliation(s)
- José Américo Nabuco Leva Ferreira de Freitas
- IMRB, Mondor Institute for Biomedical Research, INSERM U955 – Université Paris Est Créteil, UPEC, Faculté de Médecine de Créteil 8, rue du Général Sarrail, 94010 Créteil
- Sorbonne Université, UMR 8256, Biological Adaptation and Ageing B2A–IBPS, F-75005, Paris, France
- INSERM U1164, F-75005, Paris, France
| | - Oliver Bischof
- IMRB, Mondor Institute for Biomedical Research, INSERM U955 – Université Paris Est Créteil, UPEC, Faculté de Médecine de Créteil 8, rue du Général Sarrail, 94010 Créteil
- Corresponding author.
| |
Collapse
|
3
|
Yamamoto A, Shibuya T. Privacy-Preserving Statistical Analysis of Genomic Data Using Compressive Mechanism with Haar Wavelet Transform. J Comput Biol 2023; 30:176-188. [PMID: 36374238 DOI: 10.1089/cmb.2022.0246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
To promote the use of personal genome information in medicine, it is important to analyze the relationship between diseases and the human genomes. Therefore, statistical analysis using genomic data is often conducted, but there is a privacy concern with respect to releasing the statistics as they are. Existing methods to address this problem using the concept of differential privacy cannot provide accurate outputs under strong privacy guarantees, making them less practical. In this study, for the first time, we investigate the application of a compressive mechanism to genomic statistical data and propose two approaches. The first is to apply the normal compressive mechanism to the statistics vector along with an algorithm to determine the number of nonzero entries in a sparse representation. The second is to alter the mechanism based on the data, aiming to release significant single nucleotide polymorphisms with a high probability. In this algorithm, we apply the compressive mechanism with the input as a sparse vector for significant data and the Laplace mechanism for nonsignificant data. By using the Haar wavelet transform for the compressive mechanism, we can determine the number of nonzero elements and the amount of noise. In addition, we give theoretical guarantees that our proposed methods achieve ϵ-differential privacy. We evaluated our methods in terms of accuracy and rank error compared with the Laplace and exponential mechanisms. The results show that our second method in particular can guarantee high privacy assurance as well as utility.
Collapse
Affiliation(s)
- Akito Yamamoto
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Tetsuo Shibuya
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
4
|
Wu K, Hao X, Liu J, Liu P, Shen F. Online Reconstruction of Complex Networks From Streaming Data. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5136-5147. [PMID: 33147156 DOI: 10.1109/tcyb.2020.3027642] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The problem of reconstructing nonlinear and complex dynamical systems from available data or time series is prominent in many fields, including engineering, physical, computer, biological, and social sciences. Many methods have been proposed to address this problem and their performance is satisfactory. However, none of them can reconstruct network structure from large-scale real-time streaming data, which leads to the failure of real-time and online analysis or control of complex systems. In this article, to overcome the limitations of current methods, we first extend the network reconstruction problem (NRP) to online settings, and then develop a follow-the-regularized-leader (FTRL)-Proximal style method to address the online complex NRP; we refer to it as Online-NR. The performance of Online-NR is validated on synthetic evolutionary game network reconstruction datasets and eight real-world networks. The experimental results demonstrate that Online-NR can effectively solve the problem of online network reconstruction with large-scale real-time streaming data. Moreover, Online-NR outperforms or matches nine state-of-the-art network reconstruction methods.
Collapse
|
5
|
The identifiability of gene regulatory networks: the role of observation data. J Biol Phys 2022; 48:93-110. [PMID: 34988715 PMCID: PMC8866611 DOI: 10.1007/s10867-021-09595-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Accepted: 11/07/2021] [Indexed: 10/19/2022] Open
Abstract
Identifying gene regulatory networks (GRN) from observation data is significant to understand biological systems. Conventional studies focus on improving the performance of identification algorithms. However, besides algorithm performance, the GRN identification is strongly depended on the observation data. In this work, for three GRN S-system models, three observation data collection schemes are used to perform the identifiability test procedure. A modified genetic algorithm-particle swarm optimization algorithm is proposed to implement this task, including the multi-level mutation operation and velocity limitation strategy. The results show that, in scheme 1 (starting from a special initial condition), the GRN systems are of identifiability using the sufficient transient observation data. In scheme 2, the observation data are short of sufficient system dynamic. The GRN systems are not of identifiability even though the state trajectories can be reproduced. As a special case of scheme 2, i.e., the steady-state observation data, the equilibrium point analysis is given to explain why it is infeasible for GRN identification. In schemes 1 and 2, the observation data are obtained from zero-input GRN systems, which will evolve to the steady state at last. The sufficient transient observation data in scheme 1 can be obtained by changing the experimental conditions. Additionally, the valid observation data can be also obtained by means of adding impulse excitation signal into GRN systems (scheme 3). Consequently, the GRN systems are identifiable using scheme 3. Owing to its universality and simplicity, these results provide a guide for biologists to collect valid observation data for identifying GRNs and to further understand GRN dynamics.
Collapse
|
6
|
On the Fourier transform of a quantitative trait: Implications for compressive sensing. J Theor Biol 2021; 540:110985. [PMID: 34953868 DOI: 10.1016/j.jtbi.2021.110985] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 12/01/2021] [Accepted: 12/09/2021] [Indexed: 11/23/2022]
Abstract
This paper explores the genotype-phenotype relationship. It outlines conditions under which the dependence of a quantitative trait on the genome might be predictable, based on measurement of a limited subset of genotypes. It uses the theory of real-valued Boolean functions in a systematic way to translate trait data into the Fourier domain. Important trait features, such as the roughness of the trait landscape or the modularity of a trait have a simple Fourier interpretation. Roughness at a gene location corresponds to high sensitivity to mutation, while a modular organization of gene activity reduces such sensitivity. Traits where rugged loci are rare will naturally compress gene data in the Fourier domain, leading to a sparse representation of trait data, concentrated in identifiable, low-level coefficients. This Fourier representation of a trait organizes epistasis in a form which is isometric to the trait data. As Fourier matrices are known to be maximally incoherent with the standard basis, this permits employing compressive sensing techniques to work from data sets that are relatively small-sometimes even of polynomial size-compared to the exponentially large sets of possible genomes. This theory provides a theoretical underpinning for systematic use of Boolean function machinery to dissect the dependency of a trait on the genome and environment.
Collapse
|
7
|
Pyne S, Anand A. Rapid Reconstruction of Time-varying Gene Regulatory Networks with Limited Main Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1608-1619. [PMID: 31613774 DOI: 10.1109/tcbb.2019.2946826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Reconstruction of time-varying gene regulatory networks underlying a time-series gene expression data is a fundamental challenge in the computational systems biology. The challenge increases multi-fold if the target networks need to be constructed for hundreds to thousands of genes. There have been constant efforts to design an algorithm that can perform the reconstruction task correctly as well as can scale efficiently (with respect to both time and memory) to such a large number of genes. However, the existing algorithms either do not offer time-efficiency, or they offer it at other costs - memory-inefficiency or imposition of a constraint, known as the 'smoothly time-varying assumption'. In this article, two novel algorithms - 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators - which is Light on memory' (TGS-Lite) and 'TGS-Lite Plus' (TGS-Lite+) - are proposed that are time-efficient, memory-efficient and do not impose the smoothly time-varying assumption. Additionally, they offer state-of-the-art reconstruction correctness as demonstrated with three benchmark datasets. Source Code: https://github.com/sap01/TGS-Lite-supplem/tree/master/sourcecode.
Collapse
|
8
|
Pyne S, Kumar AR, Anand A. Rapid Reconstruction of Time-Varying Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:278-291. [PMID: 30072338 DOI: 10.1109/tcbb.2018.2861698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Rapid advancements in high-throughput technologies have resulted in genome-scale time series datasets. Uncovering the temporal sequence of gene regulatory events, in the form of time-varying gene regulatory networks (GRNs), demands computationally fast, accurate, and scalable algorithms. The existing algorithms can be divided into two categories: ones that are time-intensive and hence unscalable; and others that impose structural constraints to become scalable. In this paper, a novel algorithm, namely 'an algorithm for reconstructing Time-varying Gene regulatory networks with Shortlisted candidate regulators' (TGS), is proposed. TGS is time-efficient and does not impose any structural constraints. Moreover, it provides such flexibility and time-efficiency, without losing its accuracy. TGS consistently outperforms the state-of-the-art algorithms in true positive detection, on three benchmark synthetic datasets. However, TGS does not perform as well in false positive rejection. To mitigate this issue, TGS+ is proposed. TGS+ demonstrates competitive false positive rejection power, while maintaining the superior speed and true positive detection power of TGS. Nevertheless, the main memory requirements of both TGS variants grow exponentially with the number of genes, which they tackle by restricting the maximum number of regulators for each gene. Relaxing this restriction remains a challenge as the actual number of regulators is not known a priori.
Collapse
|
9
|
Zhang W, Zhang F, Zhang J, Wang N. Hierarchical parameter estimation of GRN based on topological analysis. IET Syst Biol 2019; 12:294-303. [PMID: 30472694 DOI: 10.1049/iet-syb.2018.5015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Reverse engineering of gene regulatory network (GRN) is an important and challenging task in systems biology. Existing parameter estimation approaches that compute model parameters with the same importance are usually computationally expensive or infeasible, especially in dealing with complex biological networks.In order to improve the efficiency of computational modeling, the paper applies a hierarchical estimation methodology in computational modeling of GRN based on topological analysis. This paper divides nodes in a network into various priority levels using the graph-based measure and genetic algorithm. The nodes in the first level, that correspond to root strongly connected components(SCC) in the digraph of GRN, are given top priority in parameter estimation. The estimated parameters of vertices in the previous priority level ARE used to infer the parameters for nodes in the next priority level. The proposed hierarchical estimation methodology obtains lower error indexes while consuming less computational resources compared with single estimation methodology. Experimental outcomes with insilico networks and a realistic network show that gene networks are decomposed into no more than four levels, which is consistent with the properties of inherent modularity for GRN. In addition, the proposed hierarchical parameter estimation achieves a balance between computational efficiency and accuracy.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| | - Feng Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| | - Jianming Zhang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China.
| | - Ning Wang
- Department of Control Science and Engineering, Zhejiang University, Zheda Road 38, Hangzhou, People's Republic of China
| |
Collapse
|
10
|
Burbano Lombana DA, Freeman RA, Lynch KM. Discovering the topology of complex networks via adaptive estimators. CHAOS (WOODBURY, N.Y.) 2019; 29:083121. [PMID: 31472515 DOI: 10.1063/1.5088657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Accepted: 07/24/2019] [Indexed: 06/10/2023]
Abstract
Behind any complex system in nature or engineering, there is an intricate network of interconnections that is often unknown. Using a control-theoretical approach, we study the problem of network reconstruction (NR): inferring both the network structure and the coupling weights based on measurements of each node's activity. We derive two new methods for NR, a low-complexity reduced-order estimator (which projects each node's dynamics to a one-dimensional space) and a full-order estimator for cases where a reduced-order estimator is not applicable. We prove their convergence to the correct network structure using Lyapunov-like theorems and persistency of excitation. Importantly, these estimators apply to systems with partial state measurements, a broad class of node dynamics and internode coupling functions, and in the case of the reduced-order estimator, node dynamics and internode coupling functions that are not fully known. The effectiveness of the estimators is illustrated using both numerical and experimental results on networks of chaotic oscillators.
Collapse
Affiliation(s)
| | - Randy A Freeman
- Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, Illinois 60208, USA
| | - Kevin M Lynch
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, USA
| |
Collapse
|
11
|
|
12
|
Inferring dynamic topology for decoding spatiotemporal structures in complex heterogeneous networks. Proc Natl Acad Sci U S A 2018; 115:9300-9305. [PMID: 30150403 PMCID: PMC6140519 DOI: 10.1073/pnas.1721286115] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Inferring connections forms a critical step toward understanding large and diverse complex networks. To date, reliable and efficient methods for the reconstruction of network topology from measurement data remain a challenge due to the high complexity and nonlinearity of the system dynamics. These obstacles also form a bottleneck for analyzing and controlling the dynamic structures (e.g., synchrony) and collective behavior in such complex networks. The novel contribution of this work is to develop a unified data-driven approach to reliably and efficiently reveal the dynamic topology of complex networks in different scales—from cells to societies. The developed technique provides guidelines for the refinement of experimental designs toward a comprehensive understanding of complex heterogeneous networks. Extracting complex interactions (i.e., dynamic topologies) has been an essential, but difficult, step toward understanding large, complex, and diverse systems including biological, financial, and electrical networks. However, reliable and efficient methods for the recovery or estimation of network topology remain a challenge due to the tremendous scale of emerging systems (e.g., brain and social networks) and the inherent nonlinearity within and between individual units. We develop a unified, data-driven approach to efficiently infer connections of networks (ICON). We apply ICON to determine topology of networks of oscillators with different periodicities, degree nodes, coupling functions, and time scales, arising in silico, and in electrochemistry, neuronal networks, and groups of mice. This method enables the formulation of these large-scale, nonlinear estimation problems as a linear inverse problem that can be solved using parallel computing. Working with data from networks, ICON is robust and versatile enough to reliably reveal full and partial resonance among fast chemical oscillators, coherent circadian rhythms among hundreds of cells, and functional connectivity mediating social synchronization of circadian rhythmicity among mice over weeks.
Collapse
|
13
|
Esteves GH, Reis LFL. A statistical method for measuring activation of gene regulatory networks. Stat Appl Genet Mol Biol 2018; 17:sagmb-2016-0059. [PMID: 29897889 DOI: 10.1515/sagmb-2016-0059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. RESULTS We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. AVAILABILITY This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.
Collapse
Affiliation(s)
- Gustavo H Esteves
- Statistics Department, University of Paraíba State, Campina Grande, PB, Brazil
| | - Luiz F L Reis
- Teaching and Research Institute, Sírio-Libânes Hospital, São Paulo-SP, Brazil
| |
Collapse
|
14
|
Yang G, Wang L, Wang X. Reconstruction of Complex Directional Networks with Group Lasso Nonlinear Conditional Granger Causality. Sci Rep 2017; 7:2991. [PMID: 28592807 PMCID: PMC5462833 DOI: 10.1038/s41598-017-02762-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 04/18/2017] [Indexed: 12/19/2022] Open
Abstract
Reconstruction of networks underlying complex systems is one of the most crucial problems in many areas of engineering and science. In this paper, rather than identifying parameters of complex systems governed by pre-defined models or taking some polynomial and rational functions as a prior information for subsequent model selection, we put forward a general framework for nonlinear causal network reconstruction from time-series with limited observations. With obtaining multi-source datasets based on the data-fusion strategy, we propose a novel method to handle nonlinearity and directionality of complex networked systems, namely group lasso nonlinear conditional granger causality. Specially, our method can exploit different sets of radial basis functions to approximate the nonlinear interactions between each pair of nodes and integrate sparsity into grouped variables selection. The performance characteristic of our approach is firstly assessed with two types of simulated datasets from nonlinear vector autoregressive model and nonlinear dynamic models, and then verified based on the benchmark datasets from DREAM3 Challenge4. Effects of data size and noise intensity are also discussed. All of the results demonstrate that the proposed method performs better in terms of higher area under precision-recall curve.
Collapse
Affiliation(s)
- Guanxue Yang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China
| | - Lin Wang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China
| | - Xiaofan Wang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, P. R. China.
| |
Collapse
|
15
|
Henriques D, Villaverde AF, Rocha M, Saez-Rodriguez J, Banga JR. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models. PLoS Comput Biol 2017; 13:e1005379. [PMID: 28166222 PMCID: PMC5319798 DOI: 10.1371/journal.pcbi.1005379] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 02/21/2017] [Accepted: 01/24/2017] [Indexed: 11/19/2022] Open
Abstract
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM's ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Collapse
Affiliation(s)
- David Henriques
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| | - Alejandro F. Villaverde
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Julio Saez-Rodriguez
- Joint Research Center for Computational Biomedicine, RWTH-Aachen University, Aachen, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Julio R. Banga
- Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, Vigo, Spain
| |
Collapse
|
16
|
Guo Q, Liang G, Fu JQ, Han JT, Liu JG. Roles of mixing patterns in the network reconstruction. Phys Rev E 2016; 94:052303. [PMID: 27967098 DOI: 10.1103/physreve.94.052303] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Indexed: 11/07/2022]
Abstract
Compressive sensing is an effective way to reconstruct the network structure. In this paper, we investigate the effect of the mixing patterns, measured by the assortative coefficient, on the performance of network reconstruction. First, we present a model to generate networks with different assortativity coefficients, then we reconstruct the network structure by using the compressive sensing method. The experimental results show that when the assortativity coefficient r=0.2, the accuracy of the network reconstruction reaches the maximum value, which suggests that the compressive sensing is more effective for uncovering the links of social networks. Moreover, the accuracy of the network reconstruction will be higher as the network size increases.
Collapse
Affiliation(s)
- Qiang Guo
- Research Center of Complex Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, People's Republic of China
| | - Guang Liang
- Research Center of Complex Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, People's Republic of China
| | - Jia-Qi Fu
- Research Center of Complex Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, People's Republic of China
| | - Jing-Ti Han
- Data Science and Cloud Service Research Centre, Shanghai University of Finance and Economics, Shanghai 200433, People's Republic of China
| | - Jian-Guo Liu
- Research Center of Complex Systems Science, University of Shanghai for Science and Technology, Shanghai 200093, People's Republic of China.,Data Science and Cloud Service Research Centre, Shanghai University of Finance and Economics, Shanghai 200433, People's Republic of China
| |
Collapse
|
17
|
Lim H, Gray P, Xie L, Poleksic A. Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem. Sci Rep 2016; 6:38860. [PMID: 27958331 PMCID: PMC5153628 DOI: 10.1038/srep38860] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 11/15/2016] [Indexed: 12/18/2022] Open
Abstract
Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.
Collapse
Affiliation(s)
- Hansaim Lim
- Department of Computer Science, Hunter College, The City University of New York, New York, New York 10065, United States
| | - Paul Gray
- Department of Computer Science, University of Northern Iowa, Cedar Falls, Iowa 50614, United States
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York 10065, United States.,Ph.D. Program in Computer Science, Biochemistry and Biology, The Graduate Center, The City University of New York, New York, New York 10065, United States
| | - Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, Iowa 50614, United States
| |
Collapse
|
18
|
Wu K, Liu J, Wang S. Reconstructing Networks from Profit Sequences in Evolutionary Games via a Multiobjective Optimization Approach with Lasso Initialization. Sci Rep 2016; 6:37771. [PMID: 27886244 PMCID: PMC5122890 DOI: 10.1038/srep37771] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/01/2016] [Indexed: 12/02/2022] Open
Abstract
Evolutionary games (EG) model a common type of interactions in various complex, networked, natural and social systems. Given such a system with only profit sequences being available, reconstructing the interacting structure of EG networks is fundamental to understand and control its collective dynamics. Existing approaches used to handle this problem, such as the lasso, a convex optimization method, need a user-defined constant to control the tradeoff between the natural sparsity of networks and measurement error (the difference between observed data and simulated data). However, a shortcoming of these approaches is that it is not easy to determine these key parameters which can maximize the performance. In contrast to these approaches, we first model the EG network reconstruction problem as a multiobjective optimization problem (MOP), and then develop a framework which involves multiobjective evolutionary algorithm (MOEA), followed by solution selection based on knee regions, termed as MOEANet, to solve this MOP. We also design an effective initialization operator based on the lasso for MOEA. We apply the proposed method to reconstruct various types of synthetic and real-world networks, and the results show that our approach is effective to avoid the above parameter selecting problem and can reconstruct EG networks with high accuracy.
Collapse
Affiliation(s)
- Kai Wu
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an 710071, China
| | - Jing Liu
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an 710071, China
| | - Shuai Wang
- Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, Xidian University, Xi'an 710071, China
| |
Collapse
|
19
|
Abstract
Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Collapse
|
20
|
Chang YH, Dobbe R, Bhushan P, Gray JW, Tomlin CJ. Reconstruction of Gene Regulatory Networks based on Repairing Sparse Low-rank Matrices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:767-777. [PMID: 27990101 PMCID: PMC5154690 DOI: 10.1109/tcbb.2015.2465952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
With the growth of high-throughput proteomic data, in particular time series gene expression data from various perturbations, a general question that has arisen is how to organize inherently heterogenous data into meaningful structures. Since biological systems such as breast cancer tumors respond differently to various treatments, little is known about exactly how these gene regulatory networks (GRNs) operate under different stimuli. Challenges due to the lack of knowledge not only occur in modeling the dynamics of a GRN but also cause bias or uncertainties in identifying parameters or inferring the GRN structure. This paper describes a new algorithm which enables us to estimate bias error due to the effect of perturbations and correctly identify the common graph structure among biased inferred graph structures. To do this, we retrieve common dynamics of the GRN subject to various perturbations. We refer to the task as "repairing" inspired by "image repairing" in computer vision. The method can automatically correctly repair the common graph structure across perturbed GRNs, even without precise information about the effect of the perturbations. We evaluate the method on synthetic data sets and demonstrate an application to the DREAM data sets and discuss its implications to experiment design.
Collapse
Affiliation(s)
- Young Hwan Chang
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720 USA
| | - Roel Dobbe
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720 USA
| | - Palak Bhushan
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720 USA
| | - Joe W Gray
- Department of Biomedical Engineering and the Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR 97239, USA
| | - Claire J Tomlin
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720 USA; Faculty Scientist, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| |
Collapse
|
21
|
Dholaniya PS, Ghosh S, Surampudi BR, Kondapi AK. A knowledge driven supervised learning approach to identify gene network of differentially up-regulated genes during neuronal senescence in Rattus norvegicus. Biosystems 2015; 135:9-14. [PMID: 26163927 DOI: 10.1016/j.biosystems.2015.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Revised: 05/18/2015] [Accepted: 07/06/2015] [Indexed: 12/22/2022]
Abstract
Various approaches have been described to infer the gene interaction network from expression data. Several models based on computational and mathematical methods are available. The fundamental thing in the identification of the gene interaction is their biological relevance. Two genes belonging to the same pathway are more likely to affect the expression of each other than the genes of two different pathways. In the present study, interaction network of genes is described based on upregulated genes during neuronal senescence in the Cerebellar granule neurons of rat. We have adopted a supervised learning method and used it in combination with biological pathway information of the genes to develop a gene interaction network. Further modular analysis of the network has been done to identify senescence-related marker genes. Currently there is no adequate information available about the genes implicated in neuronal senescence. Thus identifying multipath genes belonging to the pathway affected by senescence might be very useful in studying the senescence process.
Collapse
Affiliation(s)
- Pankaj Singh Dholaniya
- Department of Biotechnology and Bioinfomatics, School of Life Sciences, University of Hyderabad, Hyderabad 500046, Telangana, India; Cognitive Science Lab, International Institute of Information Technology (IIIT) Hyderabad, Hyderabad 500032, Telangana, India
| | - Soumitra Ghosh
- School of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, Telangana, India; Cognitive Science Lab, International Institute of Information Technology (IIIT) Hyderabad, Hyderabad 500032, Telangana, India
| | - Bapi Raju Surampudi
- School of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, Telangana, India; Cognitive Science Lab, International Institute of Information Technology (IIIT) Hyderabad, Hyderabad 500032, Telangana, India
| | - Anand K Kondapi
- Department of Biotechnology and Bioinfomatics, School of Life Sciences, University of Hyderabad, Hyderabad 500046, Telangana, India; Cognitive Science Lab, International Institute of Information Technology (IIIT) Hyderabad, Hyderabad 500032, Telangana, India.
| |
Collapse
|