1
|
Barman S, Farid FA, Gope HL, Hafiz MFB, Khan NA, Ahmad S, Mansor S. LBF-MI: Limited Boolean Functions and Mutual Information to Infer a Gene Regulatory Network from Time-Series Gene Expression Data. Genes (Basel) 2024; 15:1530. [PMID: 39766797 PMCID: PMC11675687 DOI: 10.3390/genes15121530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 11/23/2024] [Accepted: 11/26/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND In the realm of system biology, it is a challenging endeavor to infer a gene regulatory network from time-series gene expression data. Numerous Boolean network inference techniques have emerged for reconstructing a gene regulatory network from a time-series gene expression dataset. However, most of these techniques pose scalability concerns given their capability to consider only two to three regulatory genes over a specific target gene. METHODS To overcome this limitation, a novel inference method, LBF-MI, has been proposed in this research. This two-phase method utilizes limited Boolean functions and multivariate mutual information to reconstruct a Boolean gene regulatory network from time-series gene expression data. Initially, Boolean functions are applied to determine the optimum solutions. In case of failure, multivariate mutual information is applied to obtain the optimum solutions. RESULTS This research conducted a performance-comparison experiment between LBF-MI and three other methods: mutual information-based Boolean network inference, context likelihood relatedness, and relevance network. When examined on artificial as well as real-time-series gene expression data, the outcomes exhibited that the proposed LBF-MI method outperformed mutual information-based Boolean network inference, context likelihood relatedness, and relevance network on artificial datasets, and two real Escherichia coli datasets (E. coli gene regulatory network, and SOS response of E. coli regulatory network). CONCLUSIONS LBF-MI's superior performance in gene regulatory network inference enables researchers to uncover the regulatory mechanisms and cellular behaviors of various organisms.
Collapse
Affiliation(s)
- Shohag Barman
- Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Pirojpur 8500, Bangladesh
| | - Fahmid Al Farid
- Faculty of Engineering, Multimedia University, Cyberjaya 63000, Selangor, Malaysia;
| | - Hira Lal Gope
- Faculty of Agricultural Engineering and Technology, Sylhet Agricultural University, Sylhet 3100, Bangladesh;
| | - Md. Ferdous Bin Hafiz
- Department of Computer Science and Engineering, Southeast University, Dhaka 1208, Bangladesh;
| | - Niaz Ashraf Khan
- Department of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka 1207, Bangladesh;
| | - Sabbir Ahmad
- Department of Computer Science and Engineering, University of Chittagong, Chittagong 4331, Bangladesh;
| | - Sarina Mansor
- Faculty of Engineering, Multimedia University, Cyberjaya 63000, Selangor, Malaysia;
| |
Collapse
|
2
|
Chee FT, Harun S, Mohd Daud K, Sulaiman S, Nor Muhammad NA. Exploring gene regulation and biological processes in insects: Insights from omics data using gene regulatory network models. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 189:1-12. [PMID: 38604435 DOI: 10.1016/j.pbiomolbio.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 04/13/2024]
Abstract
Gene regulatory network (GRN) comprises complicated yet intertwined gene-regulator relationships. Understanding the GRN dynamics will unravel the complexity behind the observed gene expressions. Insect gene regulation is often complicated due to their complex life cycles and diverse ecological adaptations. The main interest of this review is to have an update on the current mathematical modelling methods of GRNs to explain insect science. Several popular GRN architecture models are discussed, together with examples of applications in insect science. In the last part of this review, each model is compared from different aspects, including network scalability, computation complexity, robustness to noise and biological relevancy.
Collapse
Affiliation(s)
- Fong Ting Chee
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sarahani Harun
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Suhaila Sulaiman
- FGV R&D Sdn Bhd, FGV Innovation Center, PT23417 Lengkuk Teknologi, Bandar Baru Enstek, 71760 Nilai, Negeri Sembilan, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.
| |
Collapse
|
3
|
Pušnik Ž, Mraz M, Zimic N, Moškon M. SAILoR: Structure-Aware Inference of Logic Rules. PLoS One 2024; 19:e0304102. [PMID: 38861487 PMCID: PMC11166287 DOI: 10.1371/journal.pone.0304102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/07/2024] [Indexed: 06/13/2024] Open
Abstract
Boolean networks provide an effective mechanism for describing interactions and dynamics of gene regulatory networks (GRNs). Deriving accurate Boolean descriptions of GRNs is a challenging task. The number of experiments is usually much smaller than the number of genes. In addition, binarization leads to a loss of information and inconsistencies arise in binarized time-series data. The inference of Boolean networks from binarized time-series data alone often leads to complex and overfitted models. To obtain relevant Boolean models of gene regulatory networks, inference methods could incorporate data from multiple sources and prior knowledge in terms of general network structure and/or exact interactions. We propose the Boolean network inference method SAILoR (Structure-Aware Inference of Logic Rules). SAILoR incorporates time-series gene expression data in combination with provided reference networks to infer accurate Boolean models. SAILoR automatically extracts topological properties from reference networks. These can describe a more general structure of the GRN or can be more precise and describe specific interactions. SAILoR infers a Boolean network by learning from both continuous and binarized time-series data. It navigates between two main objectives, topological similarity to reference networks and correspondence with gene expression data. By incorporating the NSGA-II multi-objective genetic algorithm, SAILoR relies on the wisdom of crowds. Our results indicate that SAILoR can infer accurate and biologically relevant Boolean descriptions of GRNs from both a static and a dynamic perspective. We show that SAILoR improves the static accuracy of the inferred network compared to the network inference method dynGENIE3. Furthermore, we compared the performance of SAILoR with other Boolean network inference approaches including Best-Fit, REVEAL, MIBNI, GABNI, ATEN, and LogBTF. We have shown that by incorporating prior knowledge about the overall network structure, SAILoR can improve the structural correctness of the inferred Boolean networks while maintaining dynamic accuracy. To demonstrate the applicability of SAILoR, we inferred context-specific Boolean subnetworks of female Drosophila melanogaster before and after mating.
Collapse
Affiliation(s)
- Žiga Pušnik
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Mraz
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Nikolaj Zimic
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Miha Moškon
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
4
|
Zhang D, Gao S, Liu ZP, Gao R. LogicGep: Boolean networks inference using symbolic regression from time-series transcriptomic profiling data. Brief Bioinform 2024; 25:bbae286. [PMID: 38886006 PMCID: PMC11182660 DOI: 10.1093/bib/bbae286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/09/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
Collapse
Affiliation(s)
- Dezhen Zhang
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Shuhua Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Rui Gao
- Center of Intelligent Medicine, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
5
|
Gao Z, Tang J, Xia J, Zheng CH, Wei PJ. CNNGRN: A Convolutional Neural Network-Based Method for Gene Regulatory Network Inference From Bulk Time-Series Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2853-2861. [PMID: 37267145 DOI: 10.1109/tcbb.2023.3282212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Gene regulatory networks (GRNs) participate in many biological processes, and reconstructing them plays an important role in systems biology. Although many advanced methods have been proposed for GRN reconstruction, their predictive performance is far from the ideal standard, so it is urgent to design a more effective method to reconstruct GRN. Moreover, most methods only consider the gene expression data, ignoring the network structure information contained in GRN. In this study, we propose a supervised model named CNNGRN, which infers GRN from bulk time-series expression data via convolutional neural network (CNN) model, with a more informative feature. Bulk time series gene expression data imply the intricate regulatory associations between genes, and the network structure feature of ground-truth GRN contains rich neighbor information. Hence, CNNGRN integrates the above two features as model inputs. In addition, CNN is adopted to extract intricate features of genes and infer the potential associations between regulators and target genes. Moreover, feature importance visualization experiments are implemented to seek the key features. Experimental results show that CNNGRN achieved competitive performance on benchmark datasets compared to the state-of-the-art computational methods. Finally, hub genes identified based on CNNGRN have been confirmed to be involved in biological processes through literature.
Collapse
|
6
|
Gamage HN, Chetty M, Shatte A, Hallinan J. Filter feature selection based Boolean Modelling for Genetic Network Inference. Biosystems 2022; 221:104757. [PMID: 36007675 DOI: 10.1016/j.biosystems.2022.104757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 08/04/2022] [Accepted: 08/04/2022] [Indexed: 11/02/2022]
Abstract
The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.
Collapse
Affiliation(s)
| | - Madhu Chetty
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | - Adrian Shatte
- Health Innovation and Transformation Centre, Federation University, Victoria, Australia
| | | |
Collapse
|
7
|
Pušnik Ž, Mraz M, Zimic N, Moškon M. Review and assessment of Boolean approaches for inference of gene regulatory networks. Heliyon 2022; 8:e10222. [PMID: 36033302 PMCID: PMC9403406 DOI: 10.1016/j.heliyon.2022.e10222] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 04/22/2022] [Accepted: 08/03/2022] [Indexed: 10/25/2022] Open
Abstract
Boolean descriptions of gene regulatory networks can provide an insight into interactions between genes. Boolean networks hold predictive power, are easy to understand, and can be used to simulate the observed networks in different scenarios. We review fundamental and state-of-the-art methods for inference of Boolean networks. We introduce a methodology for a straightforward evaluation of Boolean inference approaches based on the generation of evaluation datasets, application of selected inference methods, and evaluation of performance measures to guide the selection of the best method for a given inference problem. We demonstrate this procedure on inference methods REVEAL (REVerse Engineering ALgorithm), Best-Fit Extension, MIBNI (Mutual Information-based Boolean Network Inference), GABNI (Genetic Algorithm-based Boolean Network Inference) and ATEN (AND/OR Tree ENsemble algorithm), which infers Boolean descriptions of gene regulatory networks from discretised time series data. Boolean inference approaches tend to perform better in terms of dynamic accuracy, and slightly worse in terms of structural correctness. We believe that the proposed methodology and provided guidelines will help researchers to develop Boolean inference approaches with a good predictive capability while maintaining structural correctness and biological relevance.
Collapse
Affiliation(s)
- Žiga Pušnik
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Miha Mraz
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Nikolaj Zimic
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| | - Miha Moškon
- University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, Ljubljana, SI-1000, Slovenia
| |
Collapse
|