1
|
Yang B, Li J, Li X, Liu S. Gene regulatory network inference based on novel ensemble method. Brief Funct Genomics 2024; 23:866-878. [PMID: 39324652 DOI: 10.1093/bfgp/elae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 08/09/2024] [Accepted: 09/06/2024] [Indexed: 09/27/2024] Open
Abstract
Gene regulatory networks (GRNs) contribute toward understanding the function of genes and the development of cancer or the impact of key genes on diseases. Hence, this study proposes an ensemble method based on 13 basic classification methods and a flexible neural tree (FNT) to improve GRN identification accuracy. The primary classification methods contain ridge classification, stochastic gradient descent, Gaussian process classification, Bernoulli Naive Bayes, adaptive boosting, gradient boosting decision tree, hist gradient boosting classification, eXtreme gradient boosting (XGBoost), multilayer perceptron, light gradient boosting machine, random forest, support vector machine, and k-nearest neighbor algorithm, which are regarded as the input variable set of FNT model. Additionally, a hybrid evolutionary algorithm based on a gene programming variant and particle swarm optimization is developed to search for the optimal FNT model. Experiments on three simulation datasets and three real single-cell RNA-seq datasets demonstrate that the proposed ensemble feature outperforms 13 supervised algorithms, seven unsupervised algorithms (ARACNE, CLR, GENIE3, MRNET, PCACMI, GENECI, and EPCACMI) and four single cell-specific methods (SCODE, BiRGRN, LEAP, and BiGBoost) based on the area under the receiver operating characteristic curve, area under the precision-recall curve, and F1 metrics.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Jing Li
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| | - Xiang Li
- Information Department, Qingdao Eighth People's Hospital, No. 84 Fengshan Road, Qingdao 266121, China
| | - Sanrong Liu
- School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China
| |
Collapse
|
2
|
Dong H, Ma B, Meng Y, Wu Y, Liu Y, Zeng T, Huang J. GRNMOPT: Inference of gene regulatory networks based on a multi-objective optimization approach. Comput Biol Chem 2024; 113:108223. [PMID: 39340962 DOI: 10.1016/j.compbiolchem.2024.108223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/21/2024] [Accepted: 09/20/2024] [Indexed: 09/30/2024]
Abstract
BACKGROUND AND OBJECTIVE The reconstruction of gene regulatory networks (GRNs) stands as a vital approach in deciphering complex biological processes. The application of nonlinear ordinary differential equations (ODEs) models has demonstrated considerable efficacy in predicting GRNs. Notably, the decay rate and time delay are pivotal in authentic gene regulation, yet their systematic determination in ODEs models remains underexplored. The development of a comprehensive optimization framework for the effective estimation of these key parameters is essential for accurate GRN inference. METHOD This study introduces GRNMOPT, an innovative methodology for inferring GRNs from time-series and steady-state data. GRNMOPT employs a combined use of decay rate and time delay in constructing ODEs models to authentically represent gene regulatory processes. It incorporates a multi-objective optimization approach, optimizing decay rate and time delay concurrently to derive Pareto optimal sets for these factors, thereby maximizing accuracy metrics such as AUROC (Area Under the Receiver Operating Characteristic curve) and AUPR (Area Under the Precision-Recall curve). Additionally, the use of XGBoost for calculating feature importance aids in identifying potential regulatory gene links. RESULTS Comprehensive experimental evaluations on two simulated datasets from DREAM4 and three real gene expression datasets (Yeast, In vivo Reverse-engineering and Modeling Assessment [IRMA], and Escherichia coli [E. coli]) reveal that GRNMOPT performs commendably across varying network scales. Furthermore, cross-validation experiments substantiate the robustness of GRNMOPT. CONCLUSION We propose a novel approach called GRNMOPT to infer GRNs based on a multi-objective optimization framework, which effectively improves inference accuracy and provides a powerful tool for GRNs inference.
Collapse
Affiliation(s)
- Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China.
| | - Yangyang Meng
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Yongjing Liu
- Biomedical big data center, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China; Zhejiang Provincial Key Laboratory of Pancreatic Disease, Zhejiang University School of Medicine First Affiliated Hospital, Hangzhou 310003, China; Zhejiang University Cancer Center, Zhejiang University, Hangzhou 310058, China
| | - Tao Zeng
- Biomedical big data center, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China; Zhejiang Provincial Key Laboratory of Pancreatic Disease, Zhejiang University School of Medicine First Affiliated Hospital, Hangzhou 310003, China; Zhejiang University Cancer Center, Zhejiang University, Hangzhou 310058, China
| | - Jinyan Huang
- Biomedical big data center, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China; Zhejiang Provincial Key Laboratory of Pancreatic Disease, Zhejiang University School of Medicine First Affiliated Hospital, Hangzhou 310003, China; Zhejiang University Cancer Center, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
3
|
Nourisa J, Passemiers A, Shakeri F, Omidi M, Helmholz H, Raimondi D, Moreau Y, Tomforde S, Schlüter H, Luthringer-Feyerabend B, Cyron CJ, Aydin RC, Willumeit-Römer R, Zeller-Plumhoff B. Gene regulatory network analysis identifies MYL1, MDH2, GLS, and TRIM28 as the principal proteins in the response of mesenchymal stem cells to Mg 2+ ions. Comput Struct Biotechnol J 2024; 23:1773-1785. [PMID: 38689715 PMCID: PMC11058716 DOI: 10.1016/j.csbj.2024.04.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
Magnesium (Mg)-based implants have emerged as a promising alternative for orthopedic applications, owing to their bioactive properties and biodegradability. As the implants degrade, Mg2+ ions are released, influencing all surrounding cell types, especially mesenchymal stem cells (MSCs). MSCs are vital for bone tissue regeneration, therefore, it is essential to understand their molecular response to Mg2+ ions in order to maximize the potential of Mg-based biomaterials. In this study, we conducted a gene regulatory network (GRN) analysis to examine the molecular responses of MSCs to Mg2+ ions. We used time-series proteomics data collected at 11 time points across a 21-day period for the GRN construction. We studied the impact of Mg2+ ions on the resulting networks and identified the key proteins and protein interactions affected by the application of Mg2+ ions. Our analysis highlights MYL1, MDH2, GLS, and TRIM28 as the primary targets of Mg2+ ions in the response of MSCs during 1-21 days phase. Our results also identify MDH2-MYL1, MDH2-RPS26, TRIM28-AK1, TRIM28-SOD2, and GLS-AK1 as the critical protein relationships affected by Mg2+ ions. By offering a comprehensive understanding of the regulatory role of Mg2+ ions on MSCs, our study contributes valuable insights into the molecular response of MSCs to Mg-based materials, thereby facilitating the development of innovative therapeutic strategies for orthopedic applications.
Collapse
Affiliation(s)
- Jalil Nourisa
- Institute of Material Systems Modeling, Helmholtz Zentrum Hereon, Geesthacht, Germany
| | | | - Farhad Shakeri
- Institute of Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn, Bonn, Germany
| | - Maryam Omidi
- Institute of Clinical Chemistry/Central Laboratories, University Medical Center Hamburg, Hamburg, Germany
| | - Heike Helmholz
- Institute of Metallic Biomaterials, Helmholtz Zentrum Hereon, Geesthacht, Germany
| | | | | | - Sven Tomforde
- Department of Computer Science, Intelligent Systems, University of Kiel, Kiel, Germany
| | - Hartmuth Schlüter
- Institute of Clinical Chemistry and Laboratory Medicine Diagnostic Center, University of Hamburg, Hamburg, Germany
| | | | - Christian J. Cyron
- Institute of Material Systems Modeling, Helmholtz Zentrum Hereon, Geesthacht, Germany
- Institute for Continuum and Material Mechanics, Hamburg University of Technology, Hamburg, Germany
| | - Roland C. Aydin
- Institute of Material Systems Modeling, Helmholtz Zentrum Hereon, Geesthacht, Germany
- Institute for Continuum and Material Mechanics, Hamburg University of Technology, Hamburg, Germany
| | | | | |
Collapse
|
4
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
5
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
6
|
Emadi M, Boroujeni FZ, Pirgazi J. Improved Fuzzy Cognitive Maps for Gene Regulatory Networks Inference Based on Time Series Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1816-1829. [PMID: 38963747 DOI: 10.1109/tcbb.2024.3423383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/06/2024]
Abstract
Microarray data provide lots of information regarding gene expression levels. Due to the large amount of such data, their analysis requires sufficient computational methods for identifying and analyzing gene regulation networks; however, researchers in this field are faced with numerous challenges such as consideration for too many genes and at the same time, the limited number of samples and their noisy nature of the data. In this paper, a hybrid method base on fuzzy cognitive map and compressed sensing is used to identify interactions between genes. For this purpose, in inference of the gene regulation network, the Ensemble Kalman filtered compressed sensing is used to learn the fuzzy cognitive map. Using the Ensemble Kalman filter and compressed sensing, the fuzzy cognitive map will be robust against noise. The proposed algorithm is evaluated using several metrics and compared with several well know methods such as LASSOFCM, KFRegular, CMI2NI. The experimental results show that the proposed method outperforms methods proposed in recent years in terms of SSmean, Data Error and accuracy.
Collapse
|
7
|
Wei PJ, Bao JJ, Gao Z, Tan JY, Cao RF, Su Y, Zheng CH, Deng L. MEFFGRN: Matrix enhancement and feature fusion-based method for reconstructing the gene regulatory network of epithelioma papulosum cyprini cells by spring viremia of carp virus infection. Comput Biol Med 2024; 179:108835. [PMID: 38996550 DOI: 10.1016/j.compbiomed.2024.108835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jin-Jin Bao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jing-Yun Tan
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Li Deng
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
8
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
9
|
Wu S, Jin K, Tang M, Xia Y, Gao W. Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs. Interdiscip Sci 2024; 16:318-332. [PMID: 38342857 DOI: 10.1007/s12539-024-00604-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/26/2023] [Accepted: 01/03/2024] [Indexed: 02/13/2024]
Abstract
Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.
Collapse
Affiliation(s)
- Songyang Wu
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Kui Jin
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Mingjing Tang
- School of Life Science, Yunnan Normal University, Kunming, 650500, China.
- Engineering Research Center of Sustainable Development and Utilization of Biomass Energy, Ministry of Education, Yunnan Normal University, Kunming, 650500, China.
| | - Yuelong Xia
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| | - Wei Gao
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
| |
Collapse
|
10
|
Chen Q, Zheng X, Shi H, Zhou Q, Hu H, Sun M, Xu Y, Zhang X. Prediction of influenza outbreaks in Fuzhou, China: comparative analysis of forecasting models. BMC Public Health 2024; 24:1399. [PMID: 38796443 PMCID: PMC11127308 DOI: 10.1186/s12889-024-18583-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 04/12/2024] [Indexed: 05/28/2024] Open
Abstract
BACKGROUND Influenza is a highly contagious respiratory disease that presents a significant challenge to public health globally. Therefore, effective influenza prediction and prevention are crucial for the timely allocation of resources, the development of vaccine strategies, and the implementation of targeted public health interventions. METHOD In this study, we utilized historical influenza case data from January 2013 to December 2021 in Fuzhou to develop four regression prediction models: SARIMA, Prophet, Holt-Winters, and XGBoost models. Their predicted performance was assessed by using influenza data from the period from January 2022 to December 2022 in Fuzhou. These models were used for fitting and prediction analysis. The evaluation metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), were employed to compare the performance of these models. RESULTS The results indicate that the epidemic of influenza in Fuzhou exhibits a distinct seasonal and cyclical pattern. The influenza cases data displayed a noticeable upward trend and significant fluctuations. In our study, we employed SARIMA, Prophet, Holt-Winters, and XGBoost models to predict influenza outbreaks in Fuzhou. Among these models, the XGBoost model demonstrated the best performance on both the training and test sets, yielding the lowest values for MSE, RMSE, and MAE among the four models. CONCLUSION The utilization of the XGBoost model significantly enhances the prediction accuracy of influenza in Fuzhou. This study makes a valuable contribution to the field of influenza prediction and provides substantial support for future influenza response efforts.
Collapse
Affiliation(s)
- Qingquan Chen
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Xiaoyan Zheng
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Huanhuan Shi
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Quan Zhou
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Haiping Hu
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Mengcai Sun
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China
| | - Youqiong Xu
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China.
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China.
| | - Xiaoyang Zhang
- The Affiliated Fuzhou Center for Disease Control and Prevention of Fujian Medical University, Fuzhou, 350005, China.
- The School of Public Health, Fujian Medical University, Fuzhou, 350108, China.
| |
Collapse
|
11
|
Wei PJ, Guo Z, Gao Z, Ding Z, Cao RF, Su Y, Zheng CH. Inference of gene regulatory networks based on directed graph convolutional networks. Brief Bioinform 2024; 25:bbae309. [PMID: 38935070 PMCID: PMC11209731 DOI: 10.1093/bib/bbae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 05/17/2024] [Indexed: 06/28/2024] Open
Abstract
Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Ziqiang Guo
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zheng Ding
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Chun-Hou Zheng
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| |
Collapse
|
12
|
Shen B, Coruzzi GM, Shasha D. Bipartite networks represent causality better than simple networks: evidence, algorithms, and applications. Front Genet 2024; 15:1371607. [PMID: 38798697 PMCID: PMC11120958 DOI: 10.3389/fgene.2024.1371607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 04/17/2024] [Indexed: 05/29/2024] Open
Abstract
A network, whose nodes are genes and whose directed edges represent positive or negative influences of a regulatory gene and its targets, is often used as a representation of causality. To infer a network, researchers often develop a machine learning model and then evaluate the model based on its match with experimentally verified "gold standard" edges. The desired result of such a model is a network that may extend the gold standard edges. Since networks are a form of visual representation, one can compare their utility with architectural or machine blueprints. Blueprints are clearly useful because they provide precise guidance to builders in construction. If the primary role of gene regulatory networks is to characterize causality, then such networks should be good tools of prediction because prediction is the actionable benefit of knowing causality. But are they? In this paper, we compare prediction quality based on "gold standard" regulatory edges from previous experimental work with non-linear models inferred from time series data across four different species. We show that the same non-linear machine learning models have better predictive performance, with improvements from 5.3% to 25.3% in terms of the reduction in the root mean square error (RMSE) compared with the same models based on the gold standard edges. Having established that networks fail to characterize causality properly, we suggest that causality research should focus on four goals: (i) predictive accuracy; (ii) a parsimonious enumeration of predictive regulatory genes for each target gene g; (iii) the identification of disjoint sets of predictive regulatory genes for each target g of roughly equal accuracy; and (iv) the construction of a bipartite network (whose node types are genes and models) representation of causality. We provide algorithms for all goals.
Collapse
Affiliation(s)
- Bingran Shen
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, United States
| | - Gloria M. Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, United States
| | - Dennis Shasha
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, United States
| |
Collapse
|
13
|
Gan Y, Yu J, Xu G, Yan C, Zou G. Inferring gene regulatory networks from single-cell transcriptomics based on graph embedding. Bioinformatics 2024; 40:btae291. [PMID: 38810116 PMCID: PMC11142726 DOI: 10.1093/bioinformatics/btae291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/06/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) encode gene regulation in living organisms, and have become a critical tool to understand complex biological processes. However, due to the dynamic and complex nature of gene regulation, inferring GRNs from scRNA-seq data is still a challenging task. Existing computational methods usually focus on the close connections between genes, and ignore the global structure and distal regulatory relationships. RESULTS In this study, we develop a supervised deep learning framework, IGEGRNS, to infer GRNs from scRNA-seq data based on graph embedding. In the framework, contextual information of genes is captured by GraphSAGE, which aggregates gene features and neighborhood structures to generate low-dimensional embedding for genes. Then, the k most influential nodes in the whole graph are filtered through Top-k pooling. Finally, potential regulatory relationships between genes are predicted by stacking CNNs. Compared with nine competing supervised and unsupervised methods, our method achieves better performance on six time-series scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION Our method IGEGRNS is implemented in Python using the Pytorch machine learning library, and it is freely available at https://github.com/DHUDBlab/IGEGRNS.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Jiacheng Yu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai 201620, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
14
|
Gao Z, Su Y, Xia J, Cao RF, Ding Y, Zheng CH, Wei PJ. DeepFGRN: inference of gene regulatory network with regulation type based on directed graph embedding. Brief Bioinform 2024; 25:bbae143. [PMID: 38581416 PMCID: PMC10998536 DOI: 10.1093/bib/bbae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/02/2024] [Accepted: 03/15/2024] [Indexed: 04/08/2024] Open
Abstract
The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.
Collapse
Affiliation(s)
- Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Junfeng Xia
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Rui-Fen Cao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yun Ding
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| |
Collapse
|
15
|
Zheng R, Xu Z, Zeng Y, Wang E, Li M. SPIDE: A single cell potency inference method based on the local cell-specific network entropy. Methods 2023; 220:90-97. [PMID: 37952704 DOI: 10.1016/j.ymeth.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 11/14/2023] Open
Abstract
For a given single cell RNA-seq data, it is critical to pinpoint key cellular stages and quantify cells' differentiation potency along a differentiation pathway in a time course manner. Currently, several methods based on the entropy of gene functions or PPI network have been proposed to solve the problem. Nevertheless, these methods still suffer from the inaccurate interactions and noises originating from scRNA-seq profile. In this study, we proposed a cell potency inference method based on cell-specific network entropy, called SPIDE. SPIDE introduces the local weighted cell-specific network for each cell to maintain cell heterogeneity and calculates the entropy by incorporating gene expression with network structure. In this study, we compared three cell entropy estimation models on eight scRNA-Seq datasets. The results show that SPIDE obtains consistent conclusions with real cell differentiation potency on most datasets. Moreover, SPIDE accurately recovers the continuous changes of potency during cell differentiation and significantly correlates with the stemness of tumor cells in Colorectal cancer. To conclude, our study provides a universal and accurate framework for cell entropy estimation, which deepens our understanding of cell differentiation, the development of diseases and other related biological research.
Collapse
Affiliation(s)
- Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ziwei Xu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yanping Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Edwin Wang
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary T2N 4N1, Alberta, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
16
|
Wu Y, Qian B, Wang A, Dong H, Zhu E, Ma B. iLSGRN: inference of large-scale gene regulatory networks based on multi-model fusion. Bioinformatics 2023; 39:btad619. [PMID: 37851379 PMCID: PMC10589915 DOI: 10.1093/bioinformatics/btad619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/04/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are a way of describing the interaction between genes, which contribute to revealing the different biological mechanisms in the cell. Reconstructing GRNs based on gene expression data has been a central computational problem in systems biology. However, due to the high dimensionality and non-linearity of large-scale GRNs, accurately and efficiently inferring GRNs is still a challenging task. RESULTS In this article, we propose a new approach, iLSGRN, to reconstruct large-scale GRNs from steady-state and time-series gene expression data based on non-linear ordinary differential equations. Firstly, the regulatory gene recognition algorithm calculates the Maximal Information Coefficient between genes and excludes redundant regulatory relationships to achieve dimensionality reduction. Then, the feature fusion algorithm constructs a model leveraging the feature importance derived from XGBoost (eXtreme Gradient Boosting) and RF (Random Forest) models, which can effectively train the non-linear ordinary differential equations model of GRNs and improve the accuracy and stability of the inference algorithm. The extensive experiments on different scale datasets show that our method makes sensible improvement compared with the state-of-the-art methods. Furthermore, we perform cross-validation experiments on the real gene datasets to validate the robustness and effectiveness of the proposed method. AVAILABILITY AND IMPLEMENTATION The proposed method is written in the Python language, and is available at: https://github.com/lab319/iLSGRN.
Collapse
Affiliation(s)
- Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Bing Qian
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong 999077, China
| | - Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Enqiang Zhu
- Institution of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
17
|
Zeng Y, He Y, Zheng R, Li M. Inferring single-cell gene regulatory network by non-redundant mutual information. Brief Bioinform 2023; 24:bbad326. [PMID: 37715282 DOI: 10.1093/bib/bbad326] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/12/2023] [Accepted: 08/08/2023] [Indexed: 09/17/2023] Open
Abstract
Gene regulatory network plays a crucial role in controlling the biological processes of living creatures. Deciphering the complex gene regulatory networks from experimental data remains a major challenge in system biology. Recent advances in single-cell RNA sequencing technology bring massive high-resolution data, enabling computational inference of cell-specific gene regulatory networks (GRNs). Many relevant algorithms have been developed to achieve this goal in the past years. However, GRN inference is still less ideal due to the extra noises involved in pseudo-time information and large amounts of dropouts in datasets. Here, we present a novel GRN inference method named Normi, which is based on non-redundant mutual information. Normi manipulates these problems by employing a sliding size-fixed window approach on the entire trajectory and conducts average smoothing strategy on the gene expression of the cells in each window to obtain representative cells. To further alleviate the impact of dropouts, we utilize the mixed KSG estimator to quantify the high-order time-delayed mutual information among genes, then filter out the redundant edges by adopting Max-Relevance and Min Redundancy algorithm. Moreover, we determined the optimal time delay for each gene pair by distance correlation. Normi outperforms other state-of-the-art GRN inference methods on both simulated data and single-cell RNA sequencing (scRNA-seq) datasets, demonstrating its superiority in robustness. The performance of Normi in real scRNA-seq data further reveals its ability to identify the key regulators and crucial biological processes.
Collapse
Affiliation(s)
- Yanping Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yongxin He
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
18
|
Gao Z, Tang J, Xia J, Zheng CH, Wei PJ. CNNGRN: A Convolutional Neural Network-Based Method for Gene Regulatory Network Inference From Bulk Time-Series Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2853-2861. [PMID: 37267145 DOI: 10.1109/tcbb.2023.3282212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Gene regulatory networks (GRNs) participate in many biological processes, and reconstructing them plays an important role in systems biology. Although many advanced methods have been proposed for GRN reconstruction, their predictive performance is far from the ideal standard, so it is urgent to design a more effective method to reconstruct GRN. Moreover, most methods only consider the gene expression data, ignoring the network structure information contained in GRN. In this study, we propose a supervised model named CNNGRN, which infers GRN from bulk time-series expression data via convolutional neural network (CNN) model, with a more informative feature. Bulk time series gene expression data imply the intricate regulatory associations between genes, and the network structure feature of ground-truth GRN contains rich neighbor information. Hence, CNNGRN integrates the above two features as model inputs. In addition, CNN is adopted to extract intricate features of genes and infer the potential associations between regulators and target genes. Moreover, feature importance visualization experiments are implemented to seek the key features. Experimental results show that CNNGRN achieved competitive performance on benchmark datasets compared to the state-of-the-art computational methods. Finally, hub genes identified based on CNNGRN have been confirmed to be involved in biological processes through literature.
Collapse
|
19
|
Mbebi AJ, Nikoloski Z. Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection. PLoS Comput Biol 2023; 19:e1010832. [PMID: 37523414 PMCID: PMC10414675 DOI: 10.1371/journal.pcbi.1010832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/10/2023] [Accepted: 07/11/2023] [Indexed: 08/02/2023] Open
Abstract
Despite extensive research efforts, reconstruction of gene regulatory networks (GRNs) from transcriptomics data remains a pressing challenge in systems biology. While non-linear approaches for reconstruction of GRNs show improved performance over simpler alternatives, we do not yet have understanding if joint modelling of multiple target genes may improve performance, even under linearity assumptions. To address this problem, we propose two novel approaches that cast the GRN reconstruction problem as a blend between regularized multivariate regression and graphical models that combine the L2,1-norm with classical regularization techniques. We used data and networks from the DREAM5 challenge to show that the proposed models provide consistently good performance in comparison to contenders whose performance varies with data sets from simulation and experiments from model unicellular organisms Escherichia coli and Saccharomyces cerevisiae. Since the models' formulation facilitates the prediction of master regulators, we also used the resulting findings to identify master regulators over all data sets as well as their plasticity across different environments. Our results demonstrate that the identified master regulators are in line with experimental evidence from the model bacterium E. coli. Together, our study demonstrates that simultaneous modelling of several target genes results in improved inference of GRNs and can be used as an alternative in different applications.
Collapse
Affiliation(s)
- Alain J. Mbebi
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| | - Zoran Nikoloski
- Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Germany
| |
Collapse
|
20
|
Shen B, Coruzzi G, Shasha D. EnsInfer: a simple ensemble approach to network inference outperforms any single method. BMC Bioinformatics 2023; 24:114. [PMID: 36964499 PMCID: PMC10037858 DOI: 10.1186/s12859-023-05231-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/15/2023] [Indexed: 03/26/2023] Open
Abstract
This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.
Collapse
Affiliation(s)
- Bingran Shen
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012 USA
| | - Gloria Coruzzi
- Department of Biology, Center for Genomics and Systems Biology, New York University, 12 Waverly Pl, New York, 10003 USA
| | - Dennis Shasha
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012 USA
| |
Collapse
|
21
|
Liu Q, Li J, Dong M, Liu M, Chai Y. Identification of Gene Regulatory Networks Using Variational Bayesian Inference in the Presence of Missing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:399-409. [PMID: 35061589 DOI: 10.1109/tcbb.2022.3144418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The identification of gene regulatory networks (GRN) from gene expression time series data is a challenge and open problem in system biology. This paper considers the structure inference of GRN from the incomplete and noisy gene expression data, which is a not well-studied issue for GRN inference. In this paper, the dynamical behavior of the gene expression process is described by a stochastic nonlinear state-space model with unknown noise information. A variational Bayesian (VB) framework are proposed to estimate the parameters and gene expression levels simultaneously. One of the advantages of this method is that it can easily handle the missing observations by generating the prediction values. Considering the sparsity of GRN, the smoothed gene data are modeled by the extreme gradient boosting tree, and the regulatory interactions among genes are identified by the importance scores based on the tree model. The proposed method is tested on the artificial DREAM4 datasets and one real gene expression dataset of yeast. The comparative results show that the proposed method can effectively recover the regulatory interactions of GRN in the presence of missing observations and outperforms the existing methods for GRN identification.
Collapse
|
22
|
Inference of gene regulatory networks based on the Light Gradient Boosting Machine. Comput Biol Chem 2022; 101:107769. [DOI: 10.1016/j.compbiolchem.2022.107769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/12/2022] [Accepted: 09/06/2022] [Indexed: 11/23/2022]
|
23
|
Mao G, Zeng R, Peng J, Zuo K, Pang Z, Liu J. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinformatics 2022; 23:503. [PMID: 36434499 PMCID: PMC9700916 DOI: 10.1186/s12859-022-05055-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 11/14/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Building biological networks with a certain function is a challenge in systems biology. For the functionality of small (less than ten nodes) biological networks, most methods are implemented by exhausting all possible network topological spaces. This exhaustive approach is difficult to scale to large-scale biological networks. And regulatory relationships are complex and often nonlinear or non-monotonic, which makes inference using linear models challenging. RESULTS In this paper, we propose a multi-layer perceptron-based differential equation method, which operates by training a fully connected neural network (NN) to simulate the transcription rate of genes in traditional differential equations. We verify whether the regulatory network constructed by the NN method can continue to achieve the expected biological function by verifying the degree of overlap between the regulatory network discovered by NN and the regulatory network constructed by the Hill function. And we validate our approach by adapting to noise signals, regulator knockout, and constructing large-scale gene regulatory networks using link-knockout techniques. We apply a real dataset (the mesoderm inducer Xenopus Brachyury expression) to construct the core topology of the gene regulatory network and find that Xbra is only strongly expressed at moderate levels of activin signaling. CONCLUSION We have demonstrated from the results that this method has the ability to identify the underlying network topology and functional mechanisms, and can also be applied to larger and more complex gene network topologies.
Collapse
Affiliation(s)
- Guo Mao
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Ruigeng Zeng
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Jintao Peng
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Ke Zuo
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Zhengbin Pang
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China
| | - Jie Liu
- grid.412110.70000 0000 9548 2110Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Deya Road, Changsha, 410073 China ,grid.412110.70000 0000 9548 2110Laboratory of Software Engineering for Complex System, National University of Defense Technology, Deya Road, Changsha, 410073 China
| |
Collapse
|
24
|
Yang B, Bao W, Chen B. PGRNIG: novel parallel gene regulatory network identification algorithm based on GPU. Brief Funct Genomics 2022; 21:441-454. [PMID: 36064791 DOI: 10.1093/bfgp/elac028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 07/30/2022] [Accepted: 08/03/2022] [Indexed: 12/14/2022] Open
Abstract
Molecular biology has revealed that complex life phenomena can be treated as the result of many gene interactions. Investigating these interactions and understanding the intrinsic mechanisms of biological systems using gene expression data have attracted a lot of attention. As a typical gene regulatory network (GRN) inference method, the S-system has been utilized to deal with small-scale network identification. However, it is extremely difficult to optimize it to infer medium-to-large networks. This paper proposes a novel parallel swarm intelligent algorithm, PGRNIG, to optimize the parameters of the S-system. We employed the clone selection strategy to improve the whale optimization algorithm (CWOA). To enhance the time efficiency of CWOA optimization, we utilized a parallel CWOA (PCWOA) based on the compute unified device architecture (CUDA) platform. Decomposition strategy and L1 regularization were utilized to reduce the search space and complexity of GRN inference. We applied the PGRNIG algorithm on three synthetic datasets and two real time-series expression datasets of the species of Escherichia coli and Saccharomyces cerevisiae. Experimental results show that PGRNIG could infer the gene regulatory network more accurately than other state-of-the-art methods with a convincing computational speed-up. Our findings show that CWOA and PCWOA have faster convergence performances than WOA.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou 221018, China
| | - Baitong Chen
- Xuzhou First People's Hospital, Xuzhou 221000, China
| |
Collapse
|
25
|
Lei J, Cai Z, He X, Zheng W, Liu J. An approach of gene regulatory network construction using mixed entropy optimizing context-related likelihood mutual information. Bioinformatics 2022; 39:6808612. [PMID: 36342190 PMCID: PMC9805593 DOI: 10.1093/bioinformatics/btac717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 09/18/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION The question of how to construct gene regulatory networks has long been a focus of biological research. Mutual information can be used to measure nonlinear relationships, and it has been widely used in the construction of gene regulatory networks. However, this method cannot measure indirect regulatory relationships under the influence of multiple genes, which reduces the accuracy of inferring gene regulatory networks. APPROACH This work proposes a method for constructing gene regulatory networks based on mixed entropy optimizing context-related likelihood mutual information (MEOMI). First, two entropy estimators were combined to calculate the mutual information between genes. Then, distribution optimization was performed using a context-related likelihood algorithm to eliminate some indirect regulatory relationships and obtain the initial gene regulatory network. To obtain the complex interaction between genes and eliminate redundant edges in the network, the initial gene regulatory network was further optimized by calculating the conditional mutual inclusive information (CMI2) between gene pairs under the influence of multiple genes. The network was iteratively updated to reduce the impact of mutual information on the overestimation of the direct regulatory intensity. RESULTS The experimental results show that the MEOMI method performed better than several other kinds of gene network construction methods on DREAM challenge simulated datasets (DREAM3 and DREAM5), three real Escherichia coli datasets (E.coli SOS pathway network, E.coli SOS DNA repair network and E.coli community network) and two human datasets. AVAILABILITY AND IMPLEMENTATION Source code and dataset are available at https://github.com/Dalei-Dalei/MEOMI/ and http://122.205.95.139/MEOMI/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimeng Lei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zongheng Cai
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China,Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, China,College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi He
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wanting Zheng
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | | |
Collapse
|
26
|
Chen G, Liu ZP. Inferring causal gene regulatory network via GreyNet: From dynamic grey association to causation. Front Bioeng Biotechnol 2022; 10:954610. [PMID: 36237217 PMCID: PMC9551017 DOI: 10.3389/fbioe.2022.954610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/15/2022] [Indexed: 11/23/2022] Open
Abstract
Gene regulatory network (GRN) provides abundant information on gene interactions, which contributes to demonstrating pathology, predicting clinical outcomes, and identifying drug targets. Existing high-throughput experiments provide rich time-series gene expression data to reconstruct the GRN to further gain insights into the mechanism of organisms responding to external stimuli. Numerous machine-learning methods have been proposed to infer gene regulatory networks. Nevertheless, machine learning, especially deep learning, is generally a “black box,” which lacks interpretability. The causality has not been well recognized in GRN inference procedures. In this article, we introduce grey theory integrated with the adaptive sliding window technique to flexibly capture instant gene–gene interactions in the uncertain regulatory system. Then, we incorporate generalized multivariate Granger causality regression methods to transform the dynamic grey association into causation to generate directional regulatory links. We evaluate our model on the DREAM4 in silico benchmark dataset and real-world hepatocellular carcinoma (HCC) time-series data. We achieved competitive results on the DREAM4 compared with other state-of-the-art algorithms and gained meaningful GRN structure on HCC data respectively.
Collapse
Affiliation(s)
- Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
- Center for Intelligent Medicine, Shandong University, Jinan, Shandong, China
- *Correspondence: Zhi-Ping Liu,
| |
Collapse
|
27
|
Gan Y, Hu X, Zou G, Yan C, Xu G. Inferring Gene Regulatory Networks From Single-Cell Transcriptomic Data Using Bidirectional RNN. Front Oncol 2022; 12:899825. [PMID: 35692809 PMCID: PMC9178250 DOI: 10.3389/fonc.2022.899825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 04/22/2022] [Indexed: 11/30/2022] Open
Abstract
Accurate inference of gene regulatory rules is critical to understanding cellular processes. Existing computational methods usually decompose the inference of gene regulatory networks (GRNs) into multiple subproblems, rather than detecting potential causal relationships simultaneously, which limits the application to data with a small number of genes. Here, we propose BiRGRN, a novel computational algorithm for inferring GRNs from time-series single-cell RNA-seq (scRNA-seq) data. BiRGRN utilizes a bidirectional recurrent neural network to infer GRNs. The recurrent neural network is a complex deep neural network that can capture complex, non-linear, and dynamic relationships among variables. It maps neurons to genes, and maps the connections between neural network layers to the regulatory relationship between genes, providing an intuitive solution to model GRNs with biological closeness and mathematical flexibility. Based on the deep network, we transform the inference of GRNs into a regression problem, using the gene expression data at previous time points to predict the gene expression data at the later time point. Furthermore, we adopt two strategies to improve the accuracy and stability of the algorithm. Specifically, we utilize a bidirectional structure to integrate the forward and reverse inference results and exploit an incomplete set of prior knowledge to filter out some candidate inferences of low confidence. BiRGRN is applied to four simulated datasets and three real scRNA-seq datasets to verify the proposed method. We perform comprehensive comparisons between our proposed method with other state-of-the-art techniques. These experimental results indicate that BiRGRN is capable of inferring GRN simultaneously from time-series scRNA-seq data. Our method BiRGRN is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://gitee.com/DHUDBLab/bi-rgrn.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Xin Hu
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Cairong Yan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guangwei Xu
- School of Computer Science and Technology, Donghua University, Shanghai, China
| |
Collapse
|
28
|
Jiang X, Zhang X. RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique. BMC Bioinformatics 2022; 23:165. [PMID: 35524190 PMCID: PMC9074326 DOI: 10.1186/s12859-022-04696-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 04/25/2022] [Indexed: 11/29/2022] Open
Abstract
Background Current gene regulatory network (GRN) inference methods are notorious for a great number of indirect interactions hidden in the predictions. Filtering out the indirect interactions from direct ones remains an important challenge in the reconstruction of GRNs. To address this issue, we developed a redundancy silencing and network enhancement technique (RSNET) for inferring GRNs. Results To assess the performance of RSNET method, we implemented the experiments on several gold-standard networks by using simulation study, DREAM challenge dataset and Escherichia coli network. The results show that RSNET method performed better than the compared methods in sensitivity and accuracy. As a case of study, we used RSNET to construct functional GRN for apple fruit ripening from gene expression data. Conclusions In the proposed method, the redundant interactions including weak and indirect connections are silenced by recursive optimization adaptively, and the highly dependent nodes are constrained in the model to keep the real interactions. This study provides a useful tool for inferring clean networks. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04696-w.
Collapse
Affiliation(s)
- Xiaohan Jiang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China.,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, 430074, China. .,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, 430074, China.
| |
Collapse
|
29
|
Saremi M, Amirmazlaghani M. Reconstruction of Gene Regulatory Networks Using Multiple Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1827-1839. [PMID: 33539303 DOI: 10.1109/tcbb.2021.3057241] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
MOTIVATION Laboratory gene regulatory data for a species are sporadic. Despite the abundance of gene regulatory network algorithms that employ single data sets, few algorithms can combine the vast but disperse sources of data and extract the potential information. With a motivation to compensate for this shortage, we developed an algorithm called GENEREF that can accumulate information from multiple types of data sets in an iterative manner, with each iteration boosting the performance of the prediction results. RESULTS The algorithm is examined extensively on data extracted from the quintuple DREAM4 networks and DREAM5's Escherichia coli and Saccharomyces cerevisiae networks and sub-networks. Many single-dataset and multi-dataset algorithms were compared to test the performance of the algorithm. Results show that GENEREF surpasses non-ensemble state-of-the-art multi-perturbation algorithms on the selected networks and is competitive to present multiple-dataset algorithms. Specifically, it outperforms dynGENIE3 and is on par with iRafNet. Also, we argued that a scoring method solely based on the AUPR criterion would be more trustworthy than the traditional score. AVAILABILITY The Python implementation along with the data sets and results can be downloaded from github.com/msaremi/GENEREF.
Collapse
|
30
|
Feng H, Zheng R, Wang J, Wu FX, Li M. NIMCE: A Gene Regulatory Network Inference Approach Based on Multi Time Delays Causal Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1042-1049. [PMID: 33035155 DOI: 10.1109/tcbb.2020.3029846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene regulatory networks (GRNs)are involved in various biological processes, such as cell cycle, differentiation and apoptosis. The existing large amount of expression data, especially the time-series expression data, provide a chance to infer GRNs by computational methods. These data can reveal the dynamics of gene expression and imply the regulatory relationships among genes. However, identify the indirect regulatory links is still a big challenge as most studies treat time points as independent observations, while ignoring the influences of time delays. In this study, we propose a GRN inference method based on information-theory measure, called NIMCE. NIMCE incorporates the transfer entropy to measure the regulatory links between each pair of genes, then applies the causation entropy to filter indirect relationships. In addition, NIMCE applies multi time delays to identify indirect regulatory relationships from candidate genes. Experiments on simulated and colorectal cancer data show NIMCE outperforms than other competing methods. All data and codes used in this study are publicly available at https://github.com/CSUBioGroup/NIMCE.
Collapse
|
31
|
Emerging Machine Learning Techniques for Modelling Cellular Complex Systems in Alzheimer's Disease. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1338:199-208. [PMID: 34973026 DOI: 10.1007/978-3-030-78775-2_24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
We live in the big data era in the biomedical field, where machine learning has a very important contribution to the interpretation of complex biological processes and diseases, since it has the potential to create predictive models from multidimensional data sets. Part of the application of machine learning in biomedical science is to study and model complex cellular systems such as biological networks. In this context, the study of complex diseases, such as Alzheimer's diseases (AD), benefits from established methodologies of network science and machine learning as they offer algorithmic tools and techniques that can address the limitations and challenges of modeling and studying cellular AD-related networks. In this paper we analyze the opportunities and challenges at the intersection of machine learning and network biology and whether this can affect the biological interpretation and clarification of diseases. Specifically, we focus on GRN techniques which through omics data and the use of machine learning techniques can construct a network that captures all the information at the molecular level for the disease under study. We record the emerging machine learning techniques that are focus on ensemble tree-based techniques in the area of classification and regression. Their potential for unraveling the complexity of model cellular systems in complex diseases, such as AD, offers the opportunity for novel machine learning methodologies to decipher the mechanisms of the various AD processes.
Collapse
|
32
|
Staem5: A novel computational approachfor accurate prediction of m5C site. MOLECULAR THERAPY. NUCLEIC ACIDS 2021; 26:1027-1034. [PMID: 34786208 PMCID: PMC8571400 DOI: 10.1016/j.omtn.2021.10.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/27/2021] [Accepted: 10/06/2021] [Indexed: 12/25/2022]
Abstract
5-Methylcytosine (m5C) is an important post-transcriptional modification that has been extensively found in multiple types of RNAs. Many studies have shown that m5C plays vital roles in many biological functions, such as RNA structure stability and metabolism. Computational approaches act as an efficient way to identify m5C sites from high-throughput RNA sequence data and help interpret the functional mechanism of this important modification. This study proposed a novel species-specific computational approach, Staem5, to accurately predict RNA m5C sites in Mus musculus and Arabidopsis thaliana. Staem5 was developed by employing feature fusion tactics to leverage informatic sequence profiles, and a stacking ensemble learning framework combined five popular machine learning algorithms. Extensive benchmarking tests demonstrated that Staem5 outperformed state-of-the-art approaches in both cross-validation and independent tests. We provide the source code of Staem5, which is publicly available at https://github.com/Cxd-626/Staem5.git.
Collapse
|
33
|
Lin CX, Li HD, Deng C, Guan Y, Wang J. TissueNexus: a database of human tissue functional gene networks built with a large compendium of curated RNA-seq data. Nucleic Acids Res 2021; 50:D710-D718. [PMID: 34850130 PMCID: PMC8728275 DOI: 10.1093/nar/gkab1133] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 10/10/2021] [Accepted: 11/18/2021] [Indexed: 01/02/2023] Open
Abstract
Mapping gene interactions within tissues/cell types plays a crucial role in understanding the genetic basis of human physiology and disease. Tissue functional gene networks (FGNs) are essential models for mapping complex gene interactions. We present TissueNexus, a database of 49 human tissue/cell line FGNs constructed by integrating heterogeneous genomic data. We adopted an advanced machine learning approach for data integration because Bayesian classifiers, which is the main approach used for constructing existing tissue gene networks, cannot capture the interaction and nonlinearity of genomic features well. A total of 1,341 RNA-seq datasets containing 52,087 samples were integrated for all of these networks. Because the tissue label for RNA-seq data may be annotated with different names or be missing, we performed intensive hand-curation to improve quality. We further developed a user-friendly database for network search, visualization, and functional analysis. We illustrate the application of TissueNexus in prioritizing disease genes. The database is publicly available at https://www.diseaselinks.com/TissueNexus/.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Chao Deng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
34
|
Zhao Y, Fang ZY, Lin CX, Deng C, Xu YP, Li HD. RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest. Front Genet 2021; 12:665843. [PMID: 34386033 PMCID: PMC8354212 DOI: 10.3389/fgene.2021.665843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 11/13/2022] Open
Abstract
In recent years, the application of single cell RNA-seq (scRNA-seq) has become more and more popular in fields such as biology and medical research. Analyzing scRNA-seq data can discover complex cell populations and infer single-cell trajectories in cell development. Clustering is one of the most important methods to analyze scRNA-seq data. In this paper, we focus on improving scRNA-seq clustering through gene selection, which also reduces the dimensionality of scRNA-seq data. Studies have shown that gene selection for scRNA-seq data can improve clustering accuracy. Therefore, it is important to select genes with cell type specificity. Gene selection not only helps to reduce the dimensionality of scRNA-seq data, but also can improve cell type identification in combination with clustering methods. Here, we proposed RFCell, a supervised gene selection method, which is based on permutation and random forest classification. We first use RFCell and three existing gene selection methods to select gene sets on 10 scRNA-seq data sets. Then, three classical clustering algorithms are used to cluster the cells obtained by these gene selection methods. We found that the gene selection performance of RFCell was better than other gene selection methods.
Collapse
Affiliation(s)
- Yuan Zhao
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhao-Yu Fang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Cui-Xiang Lin
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chao Deng
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yun-Pei Xu
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hong-Dong Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
35
|
Gan Y, Xin Y, Hu X, Zou G. Inferring gene regulatory network from single-cell transcriptomic data by integrating multiple prior networks. Comput Biol Chem 2021; 93:107512. [PMID: 34044202 DOI: 10.1016/j.compbiolchem.2021.107512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2021] [Accepted: 05/12/2021] [Indexed: 11/29/2022]
Abstract
Gene regulatory network models the interactions between transcription factors and target genes. Reconstructing gene regulation network is critically important to understand gene function in a particular cellular context, providing key insights into complex biological systems. We develop a new computational method, named iMPRN, which integrates multiple prior networks to infer regulatory network. Based on the network component analysis model, iMPRN adopts linear regression, graph embedding, and elastic networks to optimize each prior network in line with specific biological context. For each rewired prior networks, iMPRN evaluate the confidence of the regulatory edges in each network based on B scores and finally integrated these optimized networks. We validate the effectiveness of iMPRN by comparing it with four widely-used gene regulatory network reconstruction algorithms on a simulation data set. The results show that iMPRN can infer the gene regulatory network more accurately. Further, on a real scRNA-seq dataset, iMPRN is respectively applied to reconstruct gene regulatory networks for malignant and nonmalignant head and neck tumor cells, demonstrating distinctive differences in their corresponding regulatory networks.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Yongchang Xin
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Xin Hu
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| |
Collapse
|
36
|
He W, Tang J, Zou Q, Guo F. MMFGRN: a multi-source multi-model fusion method for gene regulatory network reconstruction. Brief Bioinform 2021; 22:6261916. [PMID: 33939795 DOI: 10.1093/bib/bbab166] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/08/2021] [Accepted: 04/08/2021] [Indexed: 01/05/2023] Open
Abstract
Lots of biological processes are controlled by gene regulatory networks (GRNs), such as growth and differentiation of cells, occurrence and development of the diseases. Therefore, it is important to persistently concentrate on the research of GRN. The determination of the gene-gene relationships from gene expression data is a complex issue. Since it is difficult to efficiently obtain the regularity behind the gene-gene relationship by only relying on biochemical experimental methods, thus various computational methods have been used to construct GRNs, and some achievements have been made. In this paper, we propose a novel method MMFGRN (for "Multi-source Multi-model Fusion for Gene Regulatory Network reconstruction") to reconstruct the GRN. In order to make full use of the limited datasets and explore the potential regulatory relationships contained in different data types, we construct the MMFGRN model from three perspectives: single time series data model, single steady-data model and time series and steady-data joint model. And, we utilize the weighted fusion strategy to get the final global regulatory link ranking. Finally, MMFGRN model yields the best performance on the DREAM4 InSilico_Size10 data, outperforming other popular inference algorithms, with an overall area under receiver operating characteristic score of 0.909 and area under precision-recall (AUPR) curves score of 0.770 on the 10-gene network. Additionally, as the network scale increases, our method also has certain advantages with an overall AUPR score of 0.335 on the DREAM4 InSilico_Size100 data. These results demonstrate the good robustness of MMFGRN on different scales of networks. At the same time, the integration strategy proposed in this paper provides a new idea for the reconstruction of the biological network model without prior knowledge, which can help researchers to decipher the elusive mechanism of life.
Collapse
Affiliation(s)
| | | | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
37
|
Xu Y, Li HD, Pan Y, Luo F, Wu FX, Wang J. A Gene Rank Based Approach for Single Cell Similarity Assessment and Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:431-442. [PMID: 31369384 DOI: 10.1109/tcbb.2019.2931582] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) technology provides quantitative gene expression profiles at single-cell resolution. As a result, researchers have established new ways to explore cell population heterogeneity and genetic variability of cells. One of the current research directions for scRNA-seq data is to identify different cell types accurately through unsupervised clustering methods. However, scRNA-seq data analysis is challenging because of their high noise level, high dimensionality and sparsity. Moreover, the impact of multiple latent factors on gene expression heterogeneity and on the ability to accurately identify cell types remains unclear. How to overcome these challenges to reveal the biological difference between cell types has become the key to analyze scRNA-seq data. For these reasons, the unsupervised learning for cell population discovery based on scRNA-seq data analysis has become an important research area. A cell similarity assessment method plays a significant role in cell clustering. Here, we present BioRank, a new cell similarity assessment method based on annotated gene sets and gene ranks. To evaluate the performances, we cluster cells by two classical clustering algorithms based on the similarity between cells obtained by BioRank. In addition, BioRank can be used by any clustering algorithm that requires a similarity matrix. Applying BioRank to 12 public scRNA-seq datasets, we show that it is better than or at least as well as several popular similarity assessment methods for single cell clustering.
Collapse
|
38
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
39
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
40
|
Zheng R, Li M, Chen X, Zhao S, Wu FX, Pan Y, Wang J. An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:347-354. [PMID: 30794516 DOI: 10.1109/tcbb.2019.2900614] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gene regulatory networks (GRNs) play a key role in biological processes. However, GRNs are diverse under different biological conditions. Reconstructing gene regulatory networks (GRNs) from gene expression has become an important opportunity and challenge in the past decades. Although there are a lot of existing methods to infer the topology of GRNs, such as mutual information, random forest, and partial least squares, the accuracy is still low due to the noise and high dimension of the expression data. In this paper, we introduce an ensemble Multivariate Adaptive Regression Splines (MARS) based method to reconstruct the directed GRNs from multifactorial gene expression data, called PBMarsNet. PBMarsNet incorporates part mutual information (PMI) to pre-weight the candidate regulatory genes and then uses MARS to detect the nonlinear regulatory links. Moreover, we apply bootstrap to run the MARS multiple times and average the outputs of each MARS as the final score of regulatory links. The results on DREAM4 challenge and DREAM5 challenge datasets show PBMarsNet has a superior performance and generalization over other state-of-the-art methods.
Collapse
|
41
|
Yu B, Chen C, Qi R, Zheng R, Skillman-Lawrence PJ, Wang X, Ma A, Gu H. scGMAI: a Gaussian mixture model for clustering single-cell RNA-Seq data based on deep autoencoder. Brief Bioinform 2020; 22:6029147. [PMID: 33300547 DOI: 10.1093/bib/bbaa316] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 10/19/2020] [Indexed: 01/01/2023] Open
Abstract
The rapid development of single-cell RNA sequencing (scRNA-Seq) technology provides strong technical support for accurate and efficient analyzing single-cell gene expression data. However, the analysis of scRNA-Seq is accompanied by many obstacles, including dropout events and the curse of dimensionality. Here, we propose the scGMAI, which is a new single-cell Gaussian mixture clustering method based on autoencoder networks and the fast independent component analysis (FastICA). Specifically, scGMAI utilizes autoencoder networks to reconstruct gene expression values from scRNA-Seq data and FastICA is used to reduce the dimensions of reconstructed data. The integration of these computational techniques in scGMAI leads to outperforming results compared to existing tools, including Seurat, in clustering cells from 17 public scRNA-Seq datasets. In summary, scGMAI is an effective tool for accurately clustering and identifying cell types from scRNA-Seq data and shows the great potential of its applicative power in scRNA-Seq data analysis. The source code is available at https://github.com/QUST-AIBBDRC/scGMAI/.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technolog, China
| | - Chen Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| | - Ren Qi
- College of Intelligence and Computing, Tianjin University, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | | | - Xiaolin Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, USA
| | - Haiming Gu
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| |
Collapse
|
42
|
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics 2020; 36:4885-4893. [DOI: 10.1093/bioinformatics/btaa032] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/30/2019] [Accepted: 01/15/2020] [Indexed: 01/05/2023] Open
Abstract
Abstract
Motivation
Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks.
Results
In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity.
Availability and implementation
The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoshan Ma
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Mingkun Fang
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiangtian Jiao
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
43
|
Riggs K, Chen HS, Rotunno M, Li B, Simonds NI, Mechanic LE, Peng B. On the application, reporting, and sharing of in silico simulations for genetic studies. Genet Epidemiol 2020; 45:131-141. [PMID: 33063887 PMCID: PMC7984380 DOI: 10.1002/gepi.22362] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 09/11/2020] [Accepted: 09/14/2020] [Indexed: 12/31/2022]
Abstract
In silico simulations play an indispensable role in the development and application of statistical models and methods for genetic studies. Simulation tools allow for the evaluation of methods and investigation of models in a controlled manner. With the growing popularity of evolutionary models and simulation‐based statistical methods, genetic simulations have been applied to a wide variety of research disciplines such as population genetics, evolutionary genetics, genetic epidemiology, ecology, and conservation biology. In this review, we surveyed 1409 articles from five journals that publish on major application areas of genetic simulations. We identified 432 papers in which genetic simulations were used and examined the targets and applications of simulation studies and how these simulation methods and simulated data sets are reported and shared. Whereas a large proportion (30%) of the surveyed articles reported the use of genetic simulations, only 28% of these genetic simulation studies used existing simulation software, 2% used existing simulated data sets, and 19% and 12% made source code and simulated data sets publicly available, respectively. Moreover, 15% of articles provided no information on how simulation studies were performed. These findings suggest a need to encourage sharing and reuse of existing simulation software and data sets, as well as providing more information regarding the performance of simulations.
Collapse
Affiliation(s)
- Kaleigh Riggs
- Department of Statistics, Rice University, Houston, Texas, USA
| | - Huann-Sheng Chen
- Division of Cancer Control and Population Sciences, Statistical Research and Applications Branch, Surveillance Research Program, National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, USA
| | - Melissa Rotunno
- Division of Cancer Control and Population Sciences, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, NCI, NIH, Bethesda, Maryland, USA
| | - Bing Li
- Department of Biostatistics, Brown University, Providence, Rhode Island, USA
| | | | - Leah E Mechanic
- Division of Cancer Control and Population Sciences, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, NCI, NIH, Bethesda, Maryland, USA
| | - Bo Peng
- Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
44
|
Che D, Guo S, Jiang Q, Chen L. PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinformatics 2020; 21:308. [PMID: 32664870 PMCID: PMC7362553 DOI: 10.1186/s12859-020-03639-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 07/02/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Inferring gene regulatory networks (GRNs) from gene expression data remains a challenge in system biology. In past decade, numerous methods have been developed for the inference of GRNs. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. RESULTS We present a novel method, namely priori-fused boosting network inference method (PFBNet), to infer GRNs from time-series expression data by using the non-linear model of Boosting and the prior information (e.g., the knockout data) fusion scheme. Specifically, PFBNet first calculates the confidences of the regulation relationships using the boosting-based model, where the information about the accumulation impact of the gene expressions at previous time points is taken into account. Then, a newly defined strategy is applied to fuse the information from the prior data by elevating the confidences of the regulation relationships from the corresponding regulators. CONCLUSIONS The experiments on the benchmark datasets from DREAM challenge as well as the E.coli datasets show that PFBNet achieves significantly better performance than other state-of-the-art methods (Jump3, GEINE3-lag, HiDi, iRafNet and BiXGBoost).
Collapse
Affiliation(s)
- Dandan Che
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Shun Guo
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Qingshan Jiang
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117 China
| |
Collapse
|
45
|
Li F, Chen J, Ge Z, Wen Y, Yue Y, Hayashida M, Baggag A, Bensmail H, Song J. Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework. Brief Bioinform 2020; 22:2126-2140. [PMID: 32363397 DOI: 10.1093/bib/bbaa049] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 02/25/2020] [Accepted: 03/11/2020] [Indexed: 12/12/2022] Open
Abstract
Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing 'Black-box' approaches that are unable to reveal causal relationships from large amounts of initially encoded features.
Collapse
Affiliation(s)
- Fuyi Li
- Northwest A&F University, China.,Department of Biochemistry and Molecular Biology and the Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Australia
| | - Jinxiang Chen
- Biomedicine Discovery Institute and the Department of Biochemistry and Molecular Biology, Monash University from the College of Information Engineering, Northwest A&F University, China
| | - Zongyuan Ge
- Monash University and also serves as a Deep Learning Specialist at NVIDIA AI Technology Centre. Before joining Monash, he was a research scientist at IBM Research Australia doing research in medical AI during 2016-2018. His research interests are AI, computer vision, medical image, robotics and deep learning
| | - Ya Wen
- computer technology from Ningxia University, China
| | - Yanwei Yue
- medical science from Southern Medical University, China
| | - Morihiro Hayashida
- informatics from Kyoto University, Japan, in 2005. He is an Assistant Professor in the Department of Electrical Engineering and Computer Science, National Institute of Technology, Matsue College, Japan
| | - Abdelkader Baggag
- computer science from the University of Minnesota. He is a Senior Scientist at the Qatar Computing Research Institute (QCRI) and has a joint appointment as an Associate Professor at Hamad Bin Khalifa University (HBKU) in the Division of Information and Computing Technology. His research interests include data mining, linear algebra and machine learning
| | - Halima Bensmail
- University of Pierre & Marie Currie (Paris 6) in France. She is currently a Principal Scientist at QCRI-HBKU and a joint Associate Professor at the College of Computer and Science Engineering, HBKU
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Australia. He is also affiliated with the Monash Centre for Data Science, Faculty of Information Technology, Monash University. His research interests include bioinformatics, computational biology, machine learning, data mining, and pattern recognition
| |
Collapse
|
46
|
Zheng R, Liang Z, Chen X, Tian Y, Cao C, Li M. An Adaptive Sparse Subspace Clustering for Cell Type Identification. Front Genet 2020; 11:407. [PMID: 32425984 PMCID: PMC7212354 DOI: 10.3389/fgene.2020.00407] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 03/31/2020] [Indexed: 01/04/2023] Open
Abstract
The rapid development of single-cell transcriptome sequencing technology has provided us with a cell-level perspective to study biological problems. Identification of cell types is one of the fundamental issues in computational analysis of single-cell data. Due to the large amount of noise from single-cell technologies and high dimension of expression profiles, traditional clustering methods are not so applicable to solve it. To address the problem, we have designed an adaptive sparse subspace clustering method, called AdaptiveSSC, to identify cell types. AdaptiveSSC is based on the assumption that the expression of cells with the same type lies in the same subspace; one cell can be expressed as a linear combination of the other cells. Moreover, it uses a data-driven adaptive sparse constraint to construct the similarity matrix. The comparison results of 10 scRNA-seq datasets show that AdaptiveSSC outperforms original subspace clustering and other state-of-art methods in most cases. Moreover, the learned similarity matrix can also be integrated with a modified t-SNE to obtain an improved visualization result.
Collapse
Affiliation(s)
- Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhenlan Liang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiang Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yu Tian
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chen Cao
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
47
|
Li X, Li W, Zeng M, Zheng R, Li M. Network-based methods for predicting essential genes or proteins: a survey. Brief Bioinform 2020; 21:566-583. [PMID: 30776072 DOI: 10.1093/bib/bbz017] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/21/2019] [Accepted: 01/22/2019] [Indexed: 01/03/2025] Open
Abstract
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein-protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Wenkai Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| |
Collapse
|
48
|
Xiao Q, Zhang N, Luo J, Dai J, Tang X. Adaptive multi-source multi-view latent feature learning for inferring potential disease-associated miRNAs. Brief Bioinform 2020; 22:2043-2057. [PMID: 32186712 DOI: 10.1093/bib/bbaa028] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 02/16/2020] [Accepted: 01/14/2020] [Indexed: 12/13/2022] Open
Abstract
Accumulating evidence has shown that microRNAs (miRNAs) play crucial roles in different biological processes, and their mutations and dysregulations have been proved to contribute to tumorigenesis. In silico identification of disease-associated miRNAs is a cost-effective strategy to discover those most promising biomarkers for disease diagnosis and treatment. The increasing available omics data sources provide unprecedented opportunities to decipher the underlying relationships between miRNAs and diseases by computational models. However, most existing methods are biased towards a single representation of miRNAs or diseases and are also not capable of discovering unobserved associations for new miRNAs or diseases without association information. In this study, we present a novel computational method with adaptive multi-source multi-view latent feature learning (M2LFL) to infer potential disease-associated miRNAs. First, we adopt multiple data sources to obtain similarity profiles and capture different latent features according to the geometric characteristic of miRNA and disease spaces. Then, the multi-modal latent features are projected to a common subspace to discover unobserved miRNA-disease associations in both miRNA and disease views, and an adaptive joint graph regularization term is developed to preserve the intrinsic manifold structures of multiple similarity profiles. Meanwhile, the Lp,q-norms are imposed into the projection matrices to ensure the sparsity and improve interpretability. The experimental results confirm the superior performance of our proposed method in screening reliable candidate disease miRNAs, which suggests that M2LFL could be an efficient tool to discover diagnostic biomarkers for guiding laborious clinical trials.
Collapse
|
49
|
A Stable, Unified Density Controlled Memetic Algorithm for Gene Regulatory Network Reconstruction Based on Sparse Fuzzy Cognitive Maps. Neural Process Lett 2019. [DOI: 10.1007/s11063-019-10056-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
50
|
Zhang X, Li T, Wang J, Li J, Chen L, Liu C. Identification of Cancer-Related Long Non-Coding RNAs Using XGBoost With High Accuracy. Front Genet 2019; 10:735. [PMID: 31456817 PMCID: PMC6701491 DOI: 10.3389/fgene.2019.00735] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Accepted: 07/12/2019] [Indexed: 11/14/2022] Open
Abstract
In the past decade, hundreds of long noncoding RNAs (lncRNAs) have been identified as significant players in diverse types of cancer; however, the functions and mechanisms of most lncRNAs in cancer remain unclear. Several computational methods have been developed to detect associations between cancer and lncRNAs, yet those approaches have limitations in both sensitivity and specificity. With the goal of improving the prediction accuracy for associations of lncRNA with cancer, we upgraded our previously developed cancer-related lncRNA classifier, CRlncRC, to generate CRlncRC2. CRlncRC2 is an eXtreme Gradient Boosting (XGBoost) machine learning framework, including Synthetic Minority Over-sampling Technique (SMOTE)-based over-sampling, along with Laplacian Score-based feature selection. Ten-fold cross-validation showed that the AUC value of CRlncRC2 for identification of cancer-related lncRNAs is much higher than previously reported by CRlncRC and others. Compared with CRlncRC, the number of features used by CRlncRC2 dropped from 85 to 51. Finally, we identified 439 cancer-related lncRNA candidates using CRlncRC2. To evaluate the accuracy of the predictions, we first consulted the cancer-related long non-coding RNA database Lnc2Cancer v2.0 and relevant literature for supporting information, then conducted statistical analysis of somatic mutations, distance from cancer genes, and differential expression in tumor tissues, using various data sets. The results showed that our approach was highly reliable for identifying cancer-related lncRNA candidates. Notably, the highest ranked candidate, lncRNA AC074117.1, has not been reported previously; however, integrated multi-omics analyses demonstrate that it is the target of multiple cancer-related miRNAs and interacts with adjacent protein-coding genes, suggesting that it may act as a cancer-related competing endogenous RNA, which warrants further investigation. In conclusion, CRlncRC2 is an effective and accurate method for identification of cancer-related lncRNAs, and has potential to contribute to the functional annotation of lncRNAs and guide cancer therapy.
Collapse
Affiliation(s)
- Xuan Zhang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Tianjun Li
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China
| | - Jun Wang
- Institute of Medical Sciences, Xiangya Hospital, Central South University, Changsha, China
| | - Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming, China
| | - Long Chen
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China
| | - Changning Liu
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming, China.,University of Chinese Academy of Sciences, Beijing, China.,Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China.,Institute of Medical Sciences, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|