1
|
Jung S. Advances in modeling cellular state dynamics: integrating omics data and predictive techniques. Anim Cells Syst (Seoul) 2025; 29:72-83. [PMID: 39807350 PMCID: PMC11727055 DOI: 10.1080/19768354.2024.2449518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 12/19/2024] [Accepted: 12/29/2024] [Indexed: 01/16/2025] Open
Abstract
Dynamic modeling of cellular states has emerged as a pivotal approach for understanding complex biological processes such as cell differentiation, disease progression, and tissue development. This review provides a comprehensive overview of current approaches for modeling cellular state dynamics, focusing on techniques ranging from dynamic or static biomolecular network models to deep learning models. We highlight how these approaches integrated with various omics data such as transcriptomics, and single-cell RNA sequencing could be used to capture and predict cellular behavior and transitions. We also discuss applications of these modeling approaches in predicting gene knockout effects, designing targeted interventions, and simulating organ development. This review emphasizes the importance of selecting appropriate modeling strategies based on scalability and resolution requirements, which vary according to the complexity and size of biological systems under study. By evaluating strengths, limitations, and recent advancements of these methodologies, we aim to guide future research in developing more robust and interpretable models for understanding and manipulating cellular state dynamics in various biological contexts, ultimately advancing therapeutic strategies and precision medicine.
Collapse
Affiliation(s)
- Sungwon Jung
- Department of Genome Medicine and Science, Gachon University College of Medicine, Incheon, Republic of Korea
- Gachon Institute of Genome Medicine and Science, Gachon University Gil Medical Center, Incheon, Republic of Korea
| |
Collapse
|
2
|
Barman S, Farid FA, Gope HL, Hafiz MFB, Khan NA, Ahmad S, Mansor S. LBF-MI: Limited Boolean Functions and Mutual Information to Infer a Gene Regulatory Network from Time-Series Gene Expression Data. Genes (Basel) 2024; 15:1530. [PMID: 39766797 PMCID: PMC11675687 DOI: 10.3390/genes15121530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 11/23/2024] [Accepted: 11/26/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND In the realm of system biology, it is a challenging endeavor to infer a gene regulatory network from time-series gene expression data. Numerous Boolean network inference techniques have emerged for reconstructing a gene regulatory network from a time-series gene expression dataset. However, most of these techniques pose scalability concerns given their capability to consider only two to three regulatory genes over a specific target gene. METHODS To overcome this limitation, a novel inference method, LBF-MI, has been proposed in this research. This two-phase method utilizes limited Boolean functions and multivariate mutual information to reconstruct a Boolean gene regulatory network from time-series gene expression data. Initially, Boolean functions are applied to determine the optimum solutions. In case of failure, multivariate mutual information is applied to obtain the optimum solutions. RESULTS This research conducted a performance-comparison experiment between LBF-MI and three other methods: mutual information-based Boolean network inference, context likelihood relatedness, and relevance network. When examined on artificial as well as real-time-series gene expression data, the outcomes exhibited that the proposed LBF-MI method outperformed mutual information-based Boolean network inference, context likelihood relatedness, and relevance network on artificial datasets, and two real Escherichia coli datasets (E. coli gene regulatory network, and SOS response of E. coli regulatory network). CONCLUSIONS LBF-MI's superior performance in gene regulatory network inference enables researchers to uncover the regulatory mechanisms and cellular behaviors of various organisms.
Collapse
Affiliation(s)
- Shohag Barman
- Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Pirojpur 8500, Bangladesh
| | - Fahmid Al Farid
- Faculty of Engineering, Multimedia University, Cyberjaya 63000, Selangor, Malaysia;
| | - Hira Lal Gope
- Faculty of Agricultural Engineering and Technology, Sylhet Agricultural University, Sylhet 3100, Bangladesh;
| | - Md. Ferdous Bin Hafiz
- Department of Computer Science and Engineering, Southeast University, Dhaka 1208, Bangladesh;
| | - Niaz Ashraf Khan
- Department of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka 1207, Bangladesh;
| | - Sabbir Ahmad
- Department of Computer Science and Engineering, University of Chittagong, Chittagong 4331, Bangladesh;
| | - Sarina Mansor
- Faculty of Engineering, Multimedia University, Cyberjaya 63000, Selangor, Malaysia;
| |
Collapse
|
3
|
Chee FT, Harun S, Mohd Daud K, Sulaiman S, Nor Muhammad NA. Exploring gene regulation and biological processes in insects: Insights from omics data using gene regulatory network models. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 189:1-12. [PMID: 38604435 DOI: 10.1016/j.pbiomolbio.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 04/13/2024]
Abstract
Gene regulatory network (GRN) comprises complicated yet intertwined gene-regulator relationships. Understanding the GRN dynamics will unravel the complexity behind the observed gene expressions. Insect gene regulation is often complicated due to their complex life cycles and diverse ecological adaptations. The main interest of this review is to have an update on the current mathematical modelling methods of GRNs to explain insect science. Several popular GRN architecture models are discussed, together with examples of applications in insect science. In the last part of this review, each model is compared from different aspects, including network scalability, computation complexity, robustness to noise and biological relevancy.
Collapse
Affiliation(s)
- Fong Ting Chee
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sarahani Harun
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Suhaila Sulaiman
- FGV R&D Sdn Bhd, FGV Innovation Center, PT23417 Lengkuk Teknologi, Bandar Baru Enstek, 71760 Nilai, Negeri Sembilan, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.
| |
Collapse
|
4
|
Pušnik Ž, Mraz M, Zimic N, Moškon M. SAILoR: Structure-Aware Inference of Logic Rules. PLoS One 2024; 19:e0304102. [PMID: 38861487 PMCID: PMC11166287 DOI: 10.1371/journal.pone.0304102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/07/2024] [Indexed: 06/13/2024] Open
Abstract
Boolean networks provide an effective mechanism for describing interactions and dynamics of gene regulatory networks (GRNs). Deriving accurate Boolean descriptions of GRNs is a challenging task. The number of experiments is usually much smaller than the number of genes. In addition, binarization leads to a loss of information and inconsistencies arise in binarized time-series data. The inference of Boolean networks from binarized time-series data alone often leads to complex and overfitted models. To obtain relevant Boolean models of gene regulatory networks, inference methods could incorporate data from multiple sources and prior knowledge in terms of general network structure and/or exact interactions. We propose the Boolean network inference method SAILoR (Structure-Aware Inference of Logic Rules). SAILoR incorporates time-series gene expression data in combination with provided reference networks to infer accurate Boolean models. SAILoR automatically extracts topological properties from reference networks. These can describe a more general structure of the GRN or can be more precise and describe specific interactions. SAILoR infers a Boolean network by learning from both continuous and binarized time-series data. It navigates between two main objectives, topological similarity to reference networks and correspondence with gene expression data. By incorporating the NSGA-II multi-objective genetic algorithm, SAILoR relies on the wisdom of crowds. Our results indicate that SAILoR can infer accurate and biologically relevant Boolean descriptions of GRNs from both a static and a dynamic perspective. We show that SAILoR improves the static accuracy of the inferred network compared to the network inference method dynGENIE3. Furthermore, we compared the performance of SAILoR with other Boolean network inference approaches including Best-Fit, REVEAL, MIBNI, GABNI, ATEN, and LogBTF. We have shown that by incorporating prior knowledge about the overall network structure, SAILoR can improve the structural correctness of the inferred Boolean networks while maintaining dynamic accuracy. To demonstrate the applicability of SAILoR, we inferred context-specific Boolean subnetworks of female Drosophila melanogaster before and after mating.
Collapse
Affiliation(s)
- Žiga Pušnik
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Mraz
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Nikolaj Zimic
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Moškon
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
5
|
Zhang D, Gao S, Liu ZP, Gao R. LogicGep: Boolean networks inference using symbolic regression from time-series transcriptomic profiling data. Brief Bioinform 2024; 25:bbae286. [PMID: 38886006 PMCID: PMC11182660 DOI: 10.1093/bib/bbae286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/09/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
Collapse
Affiliation(s)
- Dezhen Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Shuhua Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Rui Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
6
|
Li L, Sun L, Chen G, Wong CW, Ching WK, Liu ZP. LogBTF: gene regulatory network inference using Boolean threshold network model from single-cell gene expression data. Bioinformatics 2023; 39:btad256. [PMID: 37079737 PMCID: PMC10172039 DOI: 10.1093/bioinformatics/btad256] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/25/2023] [Accepted: 04/13/2023] [Indexed: 04/22/2023] Open
Abstract
MOTIVATION From a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data. RESULTS In this article, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets, and three real single-cell RNA sequencing datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference. AVAILABILITY AND IMPLEMENTATION The source data and code are available at https://github.com/zpliulab/LogBTF.
Collapse
Affiliation(s)
- Lingyu Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Liangjie Sun
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| | - Chi-Wing Wong
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Wai-Ki Ching
- Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan 250061, China
| |
Collapse
|
7
|
Beneš N, Brim L, Huvar O, Pastva S, Šafránek D. Boolean network sketches: a unifying framework for logical model inference. Bioinformatics 2023; 39:btad158. [PMID: 37004199 PMCID: PMC10122605 DOI: 10.1093/bioinformatics/btad158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/02/2023] [Accepted: 03/20/2023] [Indexed: 04/03/2023] Open
Abstract
MOTIVATION The problem of model inference is of fundamental importance to systems biology. Logical models (e.g. Boolean networks; BNs) represent a computationally attractive approach capable of handling large biological networks. The models are typically inferred from experimental data. However, even with a substantial amount of experimental data supported by some prior knowledge, existing inference methods often focus on a small sample of admissible candidate models only. RESULTS We propose Boolean network sketches as a new formal instrument for the inference of Boolean networks. A sketch integrates (typically partial) knowledge about the network's topology and the update logic (obtained through, e.g. a biological knowledge base or a literature search), as well as further assumptions about the properties of the network's transitions (e.g. the form of its attractor landscape), and additional restrictions on the model dynamics given by the measured experimental data. Our new BNs inference algorithm starts with an 'initial' sketch, which is extended by adding restrictions representing experimental data to a 'data-informed' sketch and subsequently computes all BNs consistent with the data-informed sketch. Our algorithm is based on a symbolic representation and coloured model-checking. Our approach is unique in its ability to cover a broad spectrum of knowledge and efficiently produce a compact representation of all inferred BNs. We evaluate the method on a non-trivial collection of real-world and simulated data. AVAILABILITY AND IMPLEMENTATION All software and data are freely available as a reproducible artefact at https://doi.org/10.5281/zenodo.7688740.
Collapse
Affiliation(s)
- Nikola Beneš
- Faculty of Informatics, Masaryk University, Brno 602 00, Czech Republic
| | - Luboš Brim
- Faculty of Informatics, Masaryk University, Brno 602 00, Czech Republic
| | - Ondřej Huvar
- Faculty of Informatics, Masaryk University, Brno 602 00, Czech Republic
| | - Samuel Pastva
- Institute of Science and Technology Austria, Klosterneuburg 3400, Austria
| | - David Šafránek
- Faculty of Informatics, Masaryk University, Brno 602 00, Czech Republic
| |
Collapse
|
8
|
Malekpour SA, Shahdoust M, Aghdam R, Sadeghi M. wpLogicNet: logic gate and structure inference in gene regulatory networks. Bioinformatics 2023; 39:7039679. [PMID: 36790055 PMCID: PMC9936836 DOI: 10.1093/bioinformatics/btad072] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 01/25/2023] [Accepted: 02/14/2023] [Indexed: 02/16/2023] Open
Abstract
MOTIVATION The gene regulatory process resembles a logic system in which a target gene is regulated by a logic gate among its regulators. While various computational techniques are developed for a gene regulatory network (GRN) reconstruction, the study of logical relationships has received little attention. Here, we propose a novel tool called wpLogicNet that simultaneously infers both the directed GRN structures and logic gates among genes or transcription factors (TFs) that regulate their target genes, based on continuous steady-state gene expressions. RESULTS wpLogicNet proposes a framework to infer the logic gates among any number of regulators, with a low time-complexity. This distinguishes wpLogicNet from the existing logic-based models that are limited to inferring the gate between two genes or TFs. Our method applies a Bayesian mixture model to estimate the likelihood of the target gene profile and to infer the logic gate a posteriori. Furthermore, in structure-aware mode, wpLogicNet reconstructs the logic gates in TF-gene or gene-gene interaction networks with known structures. The predicted logic gates are validated on simulated datasets of TF-gene interaction networks from Escherichia coli. For the directed-edge inference, the method is validated on datasets from E.coli and DREAM project. The results show that compared to other well-known methods, wpLogicNet is more precise in reconstructing the network and logical relationships among genes. AVAILABILITY AND IMPLEMENTATION The datasets and R package of wpLogicNet are available in the github repository, https://github.com/CompBioIPM/wpLogicNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran 19395-5746, Iran
| | - Maryam Shahdoust
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran 19395-5746, Iran
| | - Rosa Aghdam
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran 19395-5746, Iran.,Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - Mehdi Sadeghi
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran 19395-5746, Iran
| |
Collapse
|
9
|
Ye Q, Guo NL. Inferencing Bulk Tumor and Single-Cell Multi-Omics Regulatory Networks for Discovery of Biomarkers and Therapeutic Targets. Cells 2022; 12:101. [PMID: 36611894 PMCID: PMC9818242 DOI: 10.3390/cells12010101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/28/2022] Open
Abstract
There are insufficient accurate biomarkers and effective therapeutic targets in current cancer treatment. Multi-omics regulatory networks in patient bulk tumors and single cells can shed light on molecular disease mechanisms. Integration of multi-omics data with large-scale patient electronic medical records (EMRs) can lead to the discovery of biomarkers and therapeutic targets. In this review, multi-omics data harmonization methods were introduced, and common approaches to molecular network inference were summarized. Our Prediction Logic Boolean Implication Networks (PLBINs) have advantages over other methods in constructing genome-scale multi-omics networks in bulk tumors and single cells in terms of computational efficiency, scalability, and accuracy. Based on the constructed multi-modal regulatory networks, graph theory network centrality metrics can be used in the prioritization of candidates for discovering biomarkers and therapeutic targets. Our approach to integrating multi-omics profiles in a patient cohort with large-scale patient EMRs such as the SEER-Medicare cancer registry combined with extensive external validation can identify potential biomarkers applicable in large patient populations. These methodologies form a conceptually innovative framework to analyze various available information from research laboratories and healthcare systems, accelerating the discovery of biomarkers and therapeutic targets to ultimately improve cancer patient survival outcomes.
Collapse
Affiliation(s)
- Qing Ye
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA
| | - Nancy Lan Guo
- West Virginia University Cancer Institute, Morgantown, WV 26506, USA
- Department of Occupational and Environmental Health Sciences, School of Public Health, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
10
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
11
|
Gamage HN, Chetty M, Shatte A, Hallinan J. Filter feature selection based Boolean Modelling for Genetic Network Inference. Biosystems 2022; 221:104757. [PMID: 36007675 DOI: 10.1016/j.biosystems.2022.104757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 08/04/2022] [Accepted: 08/04/2022] [Indexed: 11/02/2022]
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.
Collapse
Affiliation(s)
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | - Adrian Shatte
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | | |
Collapse
|
12
|
Pušnik Ž, Mraz M, Zimic N, Moškon M. Review and assessment of Boolean approaches for inference of gene regulatory networks. Heliyon 2022; 8:e10222. [PMID: 36033302 PMCID: PMC9403406 DOI: 10.1016/j.heliyon.2022.e10222] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 04/22/2022] [Accepted: 08/03/2022] [Indexed: 10/25/2022] Open
Abstract
Boolean descriptions of gene regulatory networks can provide an insight into interactions between genes. Boolean networks hold predictive power, are easy to understand, and can be used to simulate the observed networks in different scenarios. We review fundamental and state-of-the-art methods for inference of Boolean networks. We introduce a methodology for a straightforward evaluation of Boolean inference approaches based on the generation of evaluation datasets, application of selected inference methods, and evaluation of performance measures to guide the selection of the best method for a given inference problem. We demonstrate this procedure on inference methods REVEAL (REVerse Engineering ALgorithm), Best-Fit Extension, MIBNI (Mutual Information-based Boolean Network Inference), GABNI (Genetic Algorithm-based Boolean Network Inference) and ATEN (AND/OR Tree ENsemble algorithm), which infers Boolean descriptions of gene regulatory networks from discretised time series data. Boolean inference approaches tend to perform better in terms of dynamic accuracy, and slightly worse in terms of structural correctness. We believe that the proposed methodology and provided guidelines will help researchers to develop Boolean inference approaches with a good predictive capability while maintaining structural correctness and biological relevance.
Collapse
Affiliation(s)
- Žiga Pušnik
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Miha Mraz
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Nikolaj Zimic
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Miha Moškon
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| |
Collapse
|
13
|
Tan Y, Neto FBL, Neto UB. PALLAS: Penalized mAximum LikeLihood and pArticle Swarms for Inference of Gene Regulatory Networks From Time Series Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1807-1816. [PMID: 33170782 DOI: 10.1109/tcbb.2020.3037090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We present PALLAS, a practical method for gene regulatory network (GRN) inference from time series data, which employs penalized maximum likelihood and particle swarms for optimization. PALLAS is based on the Partially-Observable Boolean Dynamical System (POBDS) model and thus does not require ad-hoc binarization of the data. The penalty in the likelihood is a LASSO regularization term, which encourages the resulting network to be sparse. PALLAS is able to scale to networks of realistic size under no prior knowledge, by virtue of a novel continuous-discrete Fish School Search particle swarm algorithm for efficient simultaneous maximization of the penalized likelihood over the discrete space of networks and the continuous space of observational parameters. The performance of PALLAS is demonstrated by a comprehensive set of experiments using synthetic data generated from real and artificial networks, as well as real time series microarray and RNA-seq data, where it is compared to several other well-known methods for gene regulatory network inference. The results show that PALLAS can infer GRNs more accurately than other methods, while being capable of working directly on gene expression data, without need of ad-hoc binarization. PALLAS is a fully-fledged program, written in python, and available on GitHub (https://github.com/yukuntan92/PALLAS).
Collapse
|
14
|
Liu X, Shi N, Wang Y, Ji Z, He S. Data-Driven Boolean Network Inference Using a Genetic Algorithm With Marker-Based Encoding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1558-1569. [PMID: 33513105 DOI: 10.1109/tcbb.2021.3055646] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The inference of Boolean networks is crucial for analyzing the topology and dynamics of gene regulatory networks. Many data-driven approaches using evolutionary algorithms have been proposed based on time-series data. However, the ability to infer both network topology and dynamics is restricted by their inflexible encoding schemes. To address this problem, we propose a novel Boolean network inference algorithm for inferring both network topology and dynamics simultaneously. The main idea is that, we use a marker-based genetic algorithm to encode both regulatory nodes and logical operators in a chromosome. By using the markers and introducing more logical operators, the proposed algorithm can infer more diverse candidate Boolean functions. The proposed algorithm is applied to five networks, including two artificial Boolean networks and three real-world gene regulatory networks. Compared with other algorithms, the experimental results demonstrate that our proposed algorithm infers more accurate topology and dynamics.
Collapse
|
15
|
Putnins M, Campagne O, Mager DE, Androulakis IP. From data to QSP models: a pipeline for using Boolean networks for hypothesis inference and dynamic model building. J Pharmacokinet Pharmacodyn 2022; 49:101-115. [PMID: 34988912 PMCID: PMC9876619 DOI: 10.1007/s10928-021-09797-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/27/2021] [Indexed: 01/27/2023]
Abstract
Quantitative Systems Pharmacology (QSP) models capture the physiological underpinnings driving the response to a drug and express those in a semi-mechanistic way, often involving ordinary differential equations (ODEs). The process of developing a QSP model generally starts with the definition of a set of reasonable hypotheses that would support a mechanistic interpretation of the expected response which are used to form a network of interacting elements. This is a hypothesis-driven and knowledge-driven approach, relying on prior information about the structure of the network. However, with recent advances in our ability to generate large datasets rapidly, often in a hypothesis-neutral manner, the opportunity emerges to explore data-driven approaches to establish the network topologies and models in a robust, repeatable manner. In this paper, we explore the possibility of developing complex network representations of physiological responses to pharmaceuticals using a logic-based analysis of available data and then convert the logic relations to dynamic ODE-based models. We discuss an integrated pipeline for converting data to QSP models. This pipeline includes using k-means clustering to binarize continuous data, inferring likely network relationships using a Best-Fit Extension method to create a Boolean network, and finally converting the Boolean network to a continuous ODE model. We utilized an existing QSP model for the dual-affinity re-targeting antibody flotetuzumab to demonstrate the robustness of the process. Key output variables from the QSP model were used to generate a continuous data set for use in the pipeline. This dataset was used to reconstruct a possible model. This reconstruction had no false-positive relationships, and the output of each of the species was similar to that of the original QSP model. This demonstrates the ability to accurately infer relationships in a hypothesis-neutral manner without prior knowledge of a system using this pipeline.
Collapse
Affiliation(s)
- M. Putnins
- Biomedical Engineering Department, Rutgers University, Piscataway, USA
| | - O. Campagne
- Department of Pharmaceutical Sciences, University at Buffalo, State University of New York, Buffalo, USA
| | - D. E. Mager
- Department of Pharmaceutical Sciences, University at Buffalo, State University of New York, Buffalo, USA
| | - I. P. Androulakis
- Biomedical Engineering Department, Rutgers University, Piscataway, USA,Chemical & Biochemical Engineering Department, Rutgers University, Piscataway, USA
| |
Collapse
|
16
|
Alali M, Imani M. Inference of regulatory networks through temporally sparse data. FRONTIERS IN CONTROL ENGINEERING 2022; 3:1017256. [PMID: 36582942 PMCID: PMC9795458 DOI: 10.3389/fcteg.2022.1017256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A major goal in genomics is to properly capture the complex dynamical behaviors of gene regulatory networks (GRNs). This includes inferring the complex interactions between genes, which can be used for a wide range of genomics analyses, including diagnosis or prognosis of diseases and finding effective treatments for chronic diseases such as cancer. Boolean networks have emerged as a successful class of models for capturing the behavior of GRNs. In most practical settings, inference of GRNs should be achieved through limited and temporally sparse genomics data. A large number of genes in GRNs leads to a large possible topology candidate space, which often cannot be exhaustively searched due to the limitation in computational resources. This paper develops a scalable and efficient topology inference for GRNs using Bayesian optimization and kernel-based methods. Rather than an exhaustive search over possible topologies, the proposed method constructs a Gaussian Process (GP) with a topology-inspired kernel function to account for correlation in the likelihood function. Then, using the posterior distribution of the GP model, the Bayesian optimization efficiently searches for the topology with the highest likelihood value by optimally balancing between exploration and exploitation. The performance of the proposed method is demonstrated through comprehensive numerical experiments using a well-known mammalian cell-cycle network.
Collapse
|
17
|
Liu X, Wang Y, Shi N, Ji Z, He S. GAPORE: Boolean network inference using a genetic algorithm with novel polynomial representation and encoding scheme. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
18
|
Tan K, Huang W, Liu X, Hu J, Dong S. A Hierarchical Graph Convolution Network for Representation Learning of Gene Expression Data. IEEE J Biomed Health Inform 2021; 25:3219-3229. [PMID: 33449889 DOI: 10.1109/jbhi.2021.3052008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The curse of dimensionality, which is caused by high-dimensionality and low-sample-size, is a major challenge in gene expression data analysis. However, the real situation is even worse: labelling data is laborious and time-consuming, so only a small part of the limited samples will be labelled. Having such few labelled samples further increases the difficulty of training deep learning models. Interpretability is an important requirement in biomedicine. Many existing deep learning methods are trying to provide interpretability, but rarely apply to gene expression data. Recent semi-supervised graph convolution network methods try to address these problems by smoothing the label information over a graph. However, to the best of our knowledge, these methods only utilize graphs in either the feature space or sample space, which restrict their performance. We propose a transductive semi-supervised representation learning method called a hierarchical graph convolution network (HiGCN) to aggregate the information of gene expression data in both feature and sample spaces. HiGCN first utilizes external knowledge to construct a feature graph and a similarity kernel to construct a sample graph. Then, two spatial-based GCNs are used to aggregate information on these graphs. To validate the model's performance, synthetic and real datasets are provided to lend empirical support. Compared with two recent models and three traditional models, HiGCN learns better representations of gene expression data, and these representations improve the performance of downstream tasks, especially when the model is trained on a few labelled samples. Important features can be extracted from our model to provide reliable interpretability.
Collapse
|
19
|
Trinh HC, Kwon YK. A novel constrained genetic algorithm-based Boolean network inference method from steady-state gene expression data. Bioinformatics 2021; 37:i383-i391. [PMID: 34252959 PMCID: PMC8275338 DOI: 10.1093/bioinformatics/btab295] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION It is a challenging problem in systems biology to infer both the network structure and dynamics of a gene regulatory network from steady-state gene expression data. Some methods based on Boolean or differential equation models have been proposed but they were not efficient in inference of large-scale networks. Therefore, it is necessary to develop a method to infer the network structure and dynamics accurately on large-scale networks using steady-state expression. RESULTS In this study, we propose a novel constrained genetic algorithm-based Boolean network inference (CGA-BNI) method where a Boolean canalyzing update rule scheme was employed to capture coarse-grained dynamics. Given steady-state gene expression data as an input, CGA-BNI identifies a set of path consistency-based constraints by comparing the gene expression level between the wild-type and the mutant experiments. It then searches Boolean networks which satisfy the constraints and induce attractors most similar to steady-state expressions. We devised a heuristic mutation operation for faster convergence and implemented a parallel evaluation routine for execution time reduction. Through extensive simulations on the artificial and the real gene expression datasets, CGA-BNI showed better performance than four other existing methods in terms of both structural and dynamics prediction accuracies. Taken together, CGA-BNI is a promising tool to predict both the structure and the dynamics of a gene regulatory network when a highest accuracy is needed at the cost of sacrificing the execution time. AVAILABILITY AND IMPLEMENTATION Source code and data are freely available at https://github.com/csclab/CGA-BNI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hung-Cuong Trinh
- Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 758307, Vietnam
| | - Yung-Keun Kwon
- Department of IT Convergence, University of Ulsan, Ulsan 680-749, Korea
| |
Collapse
|
20
|
Barman S, Kwon YK. A neuro-evolution approach to infer a Boolean network from time-series gene expressions. Bioinformatics 2020; 36:i762-i769. [PMID: 33381823 DOI: 10.1093/bioinformatics/btaa840] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/14/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY In systems biology, it is challenging to accurately infer a regulatory network from time-series gene expression data, and a variety of methods have been proposed. Most of them were computationally inefficient in inferring very large networks, though, because of the increasing number of candidate regulatory genes. Although a recent approach called GABNI (genetic algorithm-based Boolean network inference) was presented to resolve this problem using a genetic algorithm, there is room for performance improvement because it employed a limited representation model of regulatory functions.In this regard, we devised a novel genetic algorithm combined with a neural network for the Boolean network inference, where a neural network is used to represent the regulatory function instead of an incomplete Boolean truth table used in the GABNI. In addition, our new method extended the range of the time-step lag parameter value between the regulatory and the target genes for more flexible representation of the regulatory function. Extensive simulations with the gene expression datasets of the artificial and real networks were conducted to compare our method with five well-known existing methods including GABNI. Our proposed method significantly outperformed them in terms of both structural and dynamics accuracy. CONCLUSION Our method can be a promising tool to infer a large-scale Boolean regulatory network from time-series gene expression data. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/kwon-uou/NNBNI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shohag Barman
- Department of Computer Science, American International University-Bangladesh (AIUB), Dhaka 1229, Bangladesh
| | - Yung-Keun Kwon
- School of IT Convergence, University of Ulsan, Ulsan 44610, Republic of Korea
| |
Collapse
|
21
|
Shi N, Zhu Z, Tang K, Parker D, He S. ATEN: And/Or tree ensemble for inferring accurate Boolean network topology and dynamics. Bioinformatics 2020; 36:578-585. [PMID: 31368481 DOI: 10.1093/bioinformatics/btz563] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 07/02/2019] [Accepted: 07/24/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks from gene expression time series data is important for gaining insights into the complex processes of cell life. A popular approach is to infer Boolean networks. However, it is still a pressing open problem to infer accurate Boolean networks from experimental data that are typically short and noisy. RESULTS To address the problem, we propose a Boolean network inference algorithm which is able to infer accurate Boolean network topology and dynamics from short and noisy time series data. The main idea is that, for each target gene, we use an And/Or tree ensemble algorithm to select prime implicants of which each is a conjunction of a set of input genes. The selected prime implicants are important features for predicting the states of the target gene. Using these important features we then infer the Boolean function of the target gene. Finally, the Boolean functions of all target genes are combined as a Boolean network. Using the data generated from artificial and real-world gene regulatory networks, we show that our algorithm can infer more accurate Boolean network topology and dynamics from short and noisy time series data than other algorithms. Our algorithm enables us to gain better insights into complex regulatory mechanisms of cell life. AVAILABILITY AND IMPLEMENTATION Package ATEN is freely available at https://github.com/ningshi/ATEN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ning Shi
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Ke Tang
- Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - David Parker
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK.,Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| |
Collapse
|
22
|
NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles. G3-GENES GENOMES GENETICS 2020; 10:2953-2963. [PMID: 32665353 PMCID: PMC7466957 DOI: 10.1534/g3.120.401067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.
Collapse
|
23
|
Hayama Nishida CE, Costa Bianchi RA, Reali Costa AH. A framework to shift basins of attraction of gene regulatory networks through batch reinforcement learning. Artif Intell Med 2020; 107:101853. [PMID: 32828434 DOI: 10.1016/j.artmed.2020.101853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 03/23/2020] [Accepted: 03/31/2020] [Indexed: 11/25/2022]
Abstract
A major challenge in gene regulatory networks (GRN) of biological systems is to discover when and what interventions should be applied to shift them to healthy phenotypes. A set of gene activity profiles, called basin of attraction (BOA), takes this network to a specific phenotype; therefore, a healthy BOA leads the GRN to a healthy phenotype. However, without the complete observability of the genes, it is not possible to identify whether the current BOA is healthy. In this article we investigate external interventions in GRN with partial observability aiming to bring it to healthy BOAs. We propose a new batch reinforcement learning method (BRL), called mSFQI, to define intervention strategies based on the probabilities of the gene activity profiles being in healthy BOAs, which are calculated from a set of previous observed experiences. BRL uses approximation functions and repeated applications of previous experiences to accelerate learning. Results demonstrate that our proposal can quickly shift a partially observable GRN to healthy BOAs, while reducing the number of interventions. In addition, when observability is poor, mSFQI produces better results when the probabilities for a greater amount of previous observations are available.
Collapse
|
24
|
Barman S, Kwon YK. A Boolean network inference from time-series gene expression data using a genetic algorithm. Bioinformatics 2019; 34:i927-i933. [PMID: 30423074 DOI: 10.1093/bioinformatics/bty584] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Inferring a gene regulatory network from time-series gene expression data is a fundamental problem in systems biology, and many methods have been proposed. However, most of them were not efficient in inferring regulatory relations involved by a large number of genes because they limited the number of regulatory genes or computed an approximated reliability of multivariate relations. Therefore, an improved method is needed to efficiently search more generalized and scalable regulatory relations. Results In this study, we propose a genetic algorithm-based Boolean network inference (GABNI) method which can search an optimal Boolean regulatory function of a large number of regulatory genes. For an efficient search, it solves the problem in two stages. GABNI first exploits an existing method, a mutual information-based Boolean network inference (MIBNI), because it can quickly find an optimal solution in a small-scale inference problem. When MIBNI fails to find an optimal solution, a genetic algorithm (GA) is applied to search an optimal set of regulatory genes in a wider solution space. In particular, we modified a typical GA framework to efficiently reduce a search space. We compared GABNI with four well-known inference methods through extensive simulations on both the artificial and the real gene expression datasets. Our results demonstrated that GABNI significantly outperformed them in both structural and dynamics accuracies. Conclusion The proposed method is an efficient and scalable tool to infer a Boolean network from time-series gene expression data. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shohag Barman
- Department of IT Convergence, University of Ulsan, 93 Nam-gu, Ulsan, Republic of Korea
| | - Yung-Keun Kwon
- Department of IT Convergence, University of Ulsan, 93 Nam-gu, Ulsan, Republic of Korea
| |
Collapse
|
25
|
Pirgazi J, Khanteymoori AR, Jalilkhani M. TIGRNCRN: Trustful inference of gene regulatory network using clustering and refining the network. J Bioinform Comput Biol 2019; 17:1950018. [DOI: 10.1142/s0219720019500185] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this study, in order to deal with the noise and uncertainty in gene expression data, learning networks, especially Bayesian networks, that have the ability to use prior knowledge, were used to infer gene regulatory network. Learning networks are methods that have the structure of the network and a learning process to obtain relationships. One of the methods which have been used for measuring the relationship between genes is the correlation metrics, but the high correlated genes not necessarily mean that they have causal effect on each other. Studies on common methods in inference of gene regulatory networks are yet to pay attention to their biological importance and as such, predictions by these methods are less accurate in terms of biological significance. Hence, in the proposed method, genes with high correlation were identified in one cluster using clustering, and the existence of edge between the genes in the cluster was prevented. Finally, after the Bayesian network modeling, based on knowledge gained from clustering, the refining phase and improving regulatory interactions using biological correlation were done. In order to show the efficiency, the proposed method has been compared with several common methods in this area including GENIE3 and BMALR. The results of the evaluation indicate that the proposed method recognized regulatory relations in Bayesian modeling process well, due to using of biological knowledge which is hidden in the data collection, and is able to recognize gene regulatory networks align with important methods in this field.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Maryam Jalilkhani
- Department of Computer Engineering, Engineering Faculty, University of Zanjan, Zanjan, Iran
| |
Collapse
|
26
|
Crawford B, Soto R, Peña A, Astorga G. A Binary Grasshopper Optimisation Algorithm Applied to the Set Covering Problem. ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING 2019. [DOI: 10.1007/978-3-319-91192-2_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
27
|
Yang B, Chen Y, Zhang W, Lv J, Bao W, Huang DS. HSCVFNT: Inference of Time-Delayed Gene Regulatory Network Based on Complex-Valued Flexible Neural Tree Model. Int J Mol Sci 2018; 19:E3178. [PMID: 30326663 PMCID: PMC6214043 DOI: 10.3390/ijms19103178] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 10/08/2018] [Accepted: 10/10/2018] [Indexed: 11/17/2022] Open
Abstract
Gene regulatory network (GRN) inference can understand the growth and development of animals and plants, and reveal the mystery of biology. Many computational approaches have been proposed to infer GRN. However, these inference approaches have hardly met the need of modeling, and the reducing redundancy methods based on individual information theory method have bad universality and stability. To overcome the limitations and shortcomings, this thesis proposes a novel algorithm, named HSCVFNT, to infer gene regulatory network with time-delayed regulations by utilizing a hybrid scoring method and complex-valued flexible neural network (CVFNT). The regulations of each target gene can be obtained by iteratively performing HSCVFNT. For each target gene, the HSCVFNT algorithm utilizes a novel scoring method based on time-delayed mutual information (TDMI), time-delayed maximum information coefficient (TDMIC) and time-delayed correlation coefficient (TDCC), to reduce the redundancy of regulatory relationships and obtain the candidate regulatory factor set. Then, the TDCC method is utilized to create time-delayed gene expression time-series matrix. Finally, a complex-valued flexible neural tree model is proposed to infer the time-delayed regulations of each target gene with the time-delayed time-series matrix. Three real time-series expression datasets from (Save Our Soul) SOS DNA repair system in E. coli and Saccharomyces cerevisiae are utilized to evaluate the performance of the HSCVFNT algorithm. As a result, HSCVFNT obtains outstanding F-scores of 0.923, 0.8 and 0.625 for SOS network and (In vivo Reverse-Engineering and Modeling Assessment) IRMA network inference, respectively, which are 5.5%, 14.3% and 72.2% higher than the best performance of other state-of-the-art GRN inference methods and time-delayed methods.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan 250002, China.
| | - Wei Zhang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Jiaguo Lv
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Wenzheng Bao
- School of Computer Science, China University of Mining and Technology, Xuzhou 221000, China.
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, Tongji University, Shanghai 200092, China.
| |
Collapse
|
28
|
Wang Z, Gudibanda A, Ugwuowo U, Trail F, Townsend JP. Using evolutionary genomics, transcriptomics, and systems biology to reveal gene networks underlying fungal development. FUNGAL BIOL REV 2018. [DOI: 10.1016/j.fbr.2018.02.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
29
|
Pirgazi J, Khanteymoori AR. A robust gene regulatory network inference method base on Kalman filter and linear regression. PLoS One 2018; 13:e0200094. [PMID: 30001352 PMCID: PMC6044105 DOI: 10.1371/journal.pone.0200094] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 06/19/2018] [Indexed: 11/24/2022] Open
Abstract
The reconstruction of the topology of gene regulatory networks (GRNs) using high
throughput genomic data such as microarray gene expression data is an important
problem in systems biology. The main challenge in gene expression data is the
high number of genes and low number of samples; also the data are often
impregnated with noise. In this paper, in dealing with the noisy data, Kalman
filter based method that has the ability to use prior knowledge on learning the
network was used. In the proposed method namely (KFLR), in the
first phase by using mutual information, the noisy regulations with low
correlations were removed. The proposed method utilized a new closed form
solution to compute the posterior probabilities of the edges from regulators to
the target gene within a hybrid framework of Bayesian model averaging and linear
regression methods. In order to show the efficiency, the proposed method was
compared with several well know methods. The results of the evaluation indicate
that the inference accuracy was improved by the proposed method which also
demonstrated better regulatory relations with the noisy data.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Computer Engineering, Engineering Faculty,
University of Zanjan, Zanjan, Iran
| | - Ali Reza Khanteymoori
- Department of Computer Engineering, Engineering Faculty,
University of Zanjan, Zanjan, Iran
- * E-mail:
| |
Collapse
|
30
|
Xing L, Guo M, Liu X, Wang C, Zhang L. Gene Regulatory Networks Reconstruction Using the Flooding-Pruning Hill-Climbing Algorithm. Genes (Basel) 2018; 9:E342. [PMID: 29986472 PMCID: PMC6071145 DOI: 10.3390/genes9070342] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 06/28/2018] [Accepted: 07/02/2018] [Indexed: 11/17/2022] Open
Abstract
The explosion of genomic data provides new opportunities to improve the task of gene regulatory network reconstruction. Because of its inherent probability character, the Bayesian network is one of the most promising methods. However, excessive computation time and the requirements of a large number of biological samples reduce its effectiveness and application to gene regulatory network reconstruction. In this paper, Flooding-Pruning Hill-Climbing algorithm (FPHC) is proposed as a novel hybrid method based on Bayesian networks for gene regulatory networks reconstruction. On the basis of our previous work, we propose the concept of DPI Level based on data processing inequality (DPI) to better identify neighbors of each gene on the lack of enough biological samples. Then, we use the search-and-score approach to learn the final network structure in the restricted search space. We first analyze and validate the effectiveness of FPHC in theory. Then, extensive comparison experiments are carried out on known Bayesian networks and biological networks from the DREAM (Dialogue on Reverse Engineering Assessment and Methods) challenge. The results show that the FPHC algorithm, under recommended parameters, outperforms, on average, the original hill climbing and Max-Min Hill-Climbing (MMHC) methods with respect to the network structure and running time. In addition, our results show that FPHC is more suitable for gene regulatory network reconstruction with limited data.
Collapse
Affiliation(s)
- Linlin Xing
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China.
- Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing 100044, China.
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
| | - Lei Zhang
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China.
| |
Collapse
|
31
|
Muñoz S, Carrillo M, Azpeitia E, Rosenblueth DA. Griffin: A Tool for Symbolic Inference of Synchronous Boolean Molecular Networks. Front Genet 2018; 9:39. [PMID: 29559993 PMCID: PMC5845696 DOI: 10.3389/fgene.2018.00039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 01/29/2018] [Indexed: 11/30/2022] Open
Abstract
Boolean networks are important models of biochemical systems, located at the high end of the abstraction spectrum. A number of Boolean gene networks have been inferred following essentially the same method. Such a method first considers experimental data for a typically underdetermined “regulation” graph. Next, Boolean networks are inferred by using biological constraints to narrow the search space, such as a desired set of (fixed-point or cyclic) attractors. We describe Griffin, a computer tool enhancing this method. Griffin incorporates a number of well-established algorithms, such as Dubrova and Teslenko's algorithm for finding attractors in synchronous Boolean networks. In addition, a formal definition of regulation allows Griffin to employ “symbolic” techniques, able to represent both large sets of network states and Boolean constraints. We observe that when the set of attractors is required to be an exact set, prohibiting additional attractors, a naive Boolean coding of this constraint may be unfeasible. Such cases may be intractable even with symbolic methods, as the number of Boolean constraints may be astronomically large. To overcome this problem, we employ an Artificial Intelligence technique known as “clause learning” considerably increasing Griffin's scalability. Without clause learning only toy examples prohibiting additional attractors are solvable: only one out of seven queries reported here is answered. With clause learning, by contrast, all seven queries are answered. We illustrate Griffin with three case studies drawn from the Arabidopsis thaliana literature. Griffin is available at: http://turing.iimas.unam.mx/griffin.
Collapse
Affiliation(s)
- Stalin Muñoz
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Facultad de Ingeniería, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Maestría en Ciencias de la Complejidad, Universidad Autónoma de la Ciudad de México, Mexico City, Mexico
| | - Miguel Carrillo
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Eugenio Azpeitia
- Institut National de Recherche en Informatique et en Automatique Project-Team Virtual Plants, Inria, CIRAD, INRA, Montpellier, France.,Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - David A Rosenblueth
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico.,Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de Mexico, Mexico City, Mexico
| |
Collapse
|
32
|
Kim SJ, Ka S, Ha JW, Kim J, Yoo D, Kim K, Lee HK, Lim D, Cho S, Hanotte O, Mwai OA, Dessie T, Kemp S, Oh SJ, Kim H. Cattle genome-wide analysis reveals genetic signatures in trypanotolerant N'Dama. BMC Genomics 2017; 18:371. [PMID: 28499406 PMCID: PMC5427609 DOI: 10.1186/s12864-017-3742-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 04/27/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Indigenous cattle in Africa have adapted to various local environments to acquire superior phenotypes that enhance their survival under harsh conditions. While many studies investigated the adaptation of overall African cattle, genetic characteristics of each breed have been poorly studied. RESULTS We performed the comparative genome-wide analysis to assess evidence for subspeciation within species at the genetic level in trypanotolerant N'Dama cattle. We analysed genetic variation patterns in N'Dama from the genomes of 101 cattle breeds including 48 samples of five indigenous African cattle breeds and 53 samples of various commercial breeds. Analysis of SNP variances between cattle breeds using wMI, XP-CLR, and XP-EHH detected genes containing N'Dama-specific genetic variants and their potential associations. Functional annotation analysis revealed that these genes are associated with ossification, neurological and immune system. Particularly, the genes involved in bone formation indicate that local adaptation of N'Dama may engage in skeletal growth as well as immune systems. CONCLUSIONS Our results imply that N'Dama might have acquired distinct genotypes associated with growth and regulation of regional diseases including trypanosomiasis. Moreover, this study offers significant insights into identifying genetic signatures for natural and artificial selection of diverse African cattle breeds.
Collapse
Affiliation(s)
- Soo-Jin Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Republic of Korea.,C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea
| | - Sojeong Ka
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jung-Woo Ha
- Clova, NAVER Corp., Seongnam, 13561, Republic of Korea
| | - Jaemin Kim
- C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea
| | - DongAhn Yoo
- C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Kwondo Kim
- C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Hak-Kyo Lee
- Department of Animal Biotechnology, Chonbuk National University, Jeonju, 66414, Republic of Korea
| | - Dajeong Lim
- Division of Animal Genomics and Bioinformatics, National Institute of Animal Science, RDA, Jeonju, 55365, Republic of Korea
| | - Seoae Cho
- C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea
| | - Olivier Hanotte
- University of Nottingham, School of Life Sciences, Nottingham, NG7 2RD, UK.,International Livestock Research Institute, Addis Ababa, Ethiopia
| | - Okeyo Ally Mwai
- International Livestock Research Institute, Box 30709-00100, Nairobi, Kenya
| | - Tadelle Dessie
- International Livestock Research Institute, Addis Ababa, Ethiopia
| | - Stephen Kemp
- International Livestock Research Institute, Box 30709-00100, Nairobi, Kenya.,The Centre for Tropical Livestock Genetics and Health, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, Scotland, UK
| | - Sung Jong Oh
- National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea.
| | - Heebal Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, 08826, Republic of Korea. .,C&K Genomics, Seoul National University Research Park, Seoul, 151-919, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|