1
|
Li L, Sun L, Chen G, Wong CW, Ching WK, Liu ZP. LogBTF: gene regulatory network inference using Boolean threshold network model from single-cell gene expression data. Bioinformatics 2023; 39:btad256. [PMID: 37079737 PMCID: PMC10172039 DOI: 10.1093/bioinformatics/btad256] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/25/2023] [Accepted: 04/13/2023] [Indexed: 04/22/2023] Open
Abstract
MOTIVATION From a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data. RESULTS In this article, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets, and three real single-cell RNA sequencing datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference. AVAILABILITY AND IMPLEMENTATION The source data and code are available at https://github.com/zpliulab/LogBTF.
Collapse
Affiliation(s)
- Lingyu Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Liangjie Sun
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Chi-Wing Wong
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Wai-Ki Ching
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| |
Collapse
|
2
|
Suter P, Kuipers J, Beerenwinkel N. Discovering gene regulatory networks of multiple phenotypic groups using dynamic Bayesian networks. Brief Bioinform 2022; 23:bbac219. [PMID: 35679575 PMCID: PMC9294428 DOI: 10.1093/bib/bbac219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database, each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and normal cells may be used for the discovery of targeted therapies.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| |
Collapse
|
3
|
Lu W, Leung CS, Sum J, Xiao Y. DNN-kWTA With Bounded Random Offset Voltage Drifts in Threshold Logic Units. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3184-3192. [PMID: 33513113 DOI: 10.1109/tnnls.2021.3050493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The dual neural network-based k -winner-take-all (DNN- k WTA) is an analog neural model that is used to identify the k largest numbers from n inputs. Since threshold logic units (TLUs) are key elements in the model, offset voltage drifts in TLUs may affect the operational correctness of a DNN- k WTA network. Previous studies assume that drifts in TLUs follow some particular distributions. This brief considers that only the drift range, given by [-∆, ∆] , is available. We consider two drift cases: time-invariant and time-varying. For the time-invariant case, we show that the state of a DNN- k WTA network converges. The sufficient condition to make a network with the correct operation is given. Furthermore, for uniformly distributed inputs, we prove that the probability that a DNN- k WTA network operates properly is greater than (1-2∆)n . The aforementioned results are generalized for the time-varying case. In addition, for the time-invariant case, we derive a method to compute the exact convergence time for a given data set. For uniformly distributed inputs, we further derive the mean and variance of the convergence time. The convergence time results give us an idea about the operational speed of the DNN- k WTA model. Finally, simulation experiments have been conducted to validate those theoretical results.
Collapse
|
4
|
Liu X, Shi N, Wang Y, Ji Z, He S. Data-Driven Boolean Network Inference Using a Genetic Algorithm With Marker-Based Encoding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1558-1569. [PMID: 33513105 DOI: 10.1109/tcbb.2021.3055646] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The inference of Boolean networks is crucial for analyzing the topology and dynamics of gene regulatory networks. Many data-driven approaches using evolutionary algorithms have been proposed based on time-series data. However, the ability to infer both network topology and dynamics is restricted by their inflexible encoding schemes. To address this problem, we propose a novel Boolean network inference algorithm for inferring both network topology and dynamics simultaneously. The main idea is that, we use a marker-based genetic algorithm to encode both regulatory nodes and logical operators in a chromosome. By using the markers and introducing more logical operators, the proposed algorithm can infer more diverse candidate Boolean functions. The proposed algorithm is applied to five networks, including two artificial Boolean networks and three real-world gene regulatory networks. Compared with other algorithms, the experimental results demonstrate that our proposed algorithm infers more accurate topology and dynamics.
Collapse
|
5
|
Liu X, Wang Y, Shi N, Ji Z, He S. GAPORE: Boolean network inference using a genetic algorithm with novel polynomial representation and encoding scheme. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
6
|
Liu P, Melkman AA, Akutsu T. Extracting boolean and probabilistic rules from trained neural networks. Neural Netw 2020; 126:300-311. [PMID: 32278262 DOI: 10.1016/j.neunet.2020.03.024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 01/18/2020] [Accepted: 03/27/2020] [Indexed: 10/24/2022]
Abstract
This paper presents two approaches to extracting rules from a trained neural network consisting of linear threshold functions. The first one leads to an algorithm that extracts rules in the form of Boolean functions. Compared with an existing one, this algorithm outputs much more concise rules if the threshold functions correspond to 1-decision lists, majority functions, or certain combinations of these. The second one extracts probabilistic rules representing relations between some of the input variables and the output using a dynamic programming algorithm. The algorithm runs in pseudo-polynomial time if each hidden layer has a constant number of neurons. We demonstrate the effectiveness of these two approaches by computational experiments.
Collapse
Affiliation(s)
- Pengyu Liu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611-0011, Japan.
| | - Avraham A Melkman
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel.
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611-0011, Japan.
| |
Collapse
|
7
|
Apostolopoulou I, Marculescu D. Tractable Learning and Inference for Large-Scale Probabilistic Boolean Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2720-2734. [PMID: 30629517 DOI: 10.1109/tnnls.2018.2886207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Probabilistic Boolean networks (PBNs) have previously been proposed so as to gain insights into complex dynamical systems. However, identification of large networks and their underlying discrete Markov chain which describes their temporal evolution still remains a challenge. In this paper, we introduce an equivalent representation for PBNs, the stochastic conjunctive normal form network (SCNFN), which enables a scalable learning algorithm and helps predict long-run dynamic behavior of large-scale systems. State-of-the-art methods turn out to be 400 times slower for middle-sized networks (i.e., containing 100 nodes) and incapable of terminating for large networks (i.e., containing 1000 nodes) compared to the SCNFN-based learning, when attempting to achieve comparable accuracy. In addition, in contrast to the currently used methods which introduce strict restrictions on the structure of the learned PBNs, the hypothesis space of our training paradigm is the set of all possible PBNs. Moreover, SCNFNs enable efficient sampling so as to statistically infer multistep transition probabilities which can provide information on the activity levels of individual nodes in the long run. Extensive experimental results showcase the scalability of the proposed approach both in terms of sample and runtime complexity. In addition, we provide examples to study large and complex cell signaling networks to show the potential of our model. Finally, we suggest several directions for future research on model variations, theoretical analysis, and potential applications of SCNFNs.
Collapse
|
8
|
Akutsu T, Melkman AA. Identification of the Structure of a Probabilistic Boolean Network From Samples Including Frequencies of Outcomes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2383-2396. [PMID: 30582556 DOI: 10.1109/tnnls.2018.2884454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We study the problem of identifying the structure of a probabilistic Boolean network (PBN), a probabilistic model of biological networks, from a given set of samples. This problem can be regarded as an identification of a set of Boolean functions from samples. Existing studies on the identification of the structure of a PBN only use information on the occurrences of samples. In this paper, we also make use of the frequencies of occurrences of subtuples, information that is obtainable from the samples. We show that under this model, it is possible to identify a PBN from among a class of PBNs, for much broader classes of PBNs. In particular, we prove that, under a reasonable assumption, the structure of a PBN can be identified from among the class of PBNs that have at most three functions assigned to each node, but that identification may be impossible if four or more functions are assigned to each node. We also analyze the sample complexity for exactly identifying the structure of a PBN, and present an efficient algorithm for the identification of a PBN consisting of threshold functions from samples.
Collapse
|
9
|
Li F, Xie L. Set Stabilization of Probabilistic Boolean Networks Using Pinning Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2555-2561. [PMID: 30530342 DOI: 10.1109/tnnls.2018.2881279] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Probabilistic Boolean network (PBN) is a kind of stochastic logical system in which update functions are randomly selected from a set of candidate Boolean functions according to a prescribed probability distribution at each time step. In this brief, a pinning controller design algorithm is proposed to set stabilize any PBN with probability one. First, an algorithm is given to change the columns of its transition matrix. Then, according to the newly obtained transition matrix, a fraction of nodes can be selected as pinning nodes to inject control inputs to achieve set stabilization. The problem is challenging since the Boolean functions in a PBN are not deterministic but are randomly chosen among several Boolean functions. Furthermore, the structure matrices of the pinning controllers are given by solving some logical matrices equations based on which a pinning controller design algorithm is provided to set stabilize the PBN with probability one. Finally, the theoretical results are validated using several examples.
Collapse
|
10
|
Computational methods for Gene Regulatory Networks reconstruction and analysis: A review. Artif Intell Med 2019; 95:133-145. [DOI: 10.1016/j.artmed.2018.10.006] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 10/23/2018] [Accepted: 10/23/2018] [Indexed: 01/14/2023]
|