1
|
Pan Q, Ding L, Hladyshau S, Yao X, Zhou J, Yan L, Dhungana Y, Shi H, Qian C, Dong X, Burdyshaw C, Veloso JP, Khatamian A, Xie Z, Risch I, Yang X, Yang J, Huang X, Fang J, Jain A, Jain A, Rusch M, Brewer M, Peng J, Yan KK, Chi H, Yu J. scMINER: a mutual information-based framework for clustering and hidden driver inference from single-cell transcriptomics data. Nat Commun 2025; 16:4305. [PMID: 40341143 PMCID: PMC12062461 DOI: 10.1038/s41467-025-59620-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Accepted: 04/28/2025] [Indexed: 05/10/2025] Open
Abstract
Single-cell transcriptomics data present challenges due to their inherent stochasticity and sparsity, complicating both cell clustering and cell type-specific network inference. To address these challenges, we introduce scMINER (single-cell Mutual Information-based Network Engineering Ranger), an integrative framework for unsupervised cell clustering, transcription factor and signaling protein network inference, and identification of hidden drivers from single-cell transcriptomic data. scMINER demonstrates superior accuracy in cell clustering, outperforming five state-of-the-art algorithms and excelling in distinguishing closely related cell populations. For network inference, scMINER outperforms three established methods, as validated by ATAC-seq and CROP-seq. In particular, it surpasses SCENIC in revealing key transcription factor drivers involved in T cell exhaustion and Treg tissue specification. Moreover, scMINER enables the inference of signaling protein networks and drivers with high accuracy, which presents an advantage in multimodal single cell data analysis. In addition, we establish scMINER Portal, an interactive visualization tool to facilitate exploration of scMINER results.
Collapse
Affiliation(s)
- Qingfei Pan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Liang Ding
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Siarhei Hladyshau
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xiangyu Yao
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiayu Zhou
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Lei Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Yogesh Dhungana
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Hao Shi
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Chenxi Qian
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xinran Dong
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai, 201102, P.R. China
| | - Chad Burdyshaw
- Department of Information Services, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Joao Pedro Veloso
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Alireza Khatamian
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Zhen Xie
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Physiology, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Isabel Risch
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xu Yang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiyuan Yang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xin Huang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Precision Research Center for Refractory Diseases, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620, China
| | - Jason Fang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Anuj Jain
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Arihant Jain
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Michael Rusch
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Michael Brewer
- Department of Information Services, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Junmin Peng
- Department of Structural Biology and Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Koon-Kiu Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Hongbo Chi
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiyang Yu
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
2
|
Vinoth S, Kingston SL, Srinivasan S, Kumarasamy S, Kapitaniak T. Extreme events in gene regulatory networks with time-delays. Sci Rep 2025; 15:13064. [PMID: 40240448 PMCID: PMC12003715 DOI: 10.1038/s41598-025-97268-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 04/03/2025] [Indexed: 04/18/2025] Open
Abstract
This work explores distinct complex dynamics of simplified two nodes of coupled gene regulatory networks with multiple delays in two self-inhibitory and mutually activated genes. We have identified the emergence of extreme events within a specific range of system parameter values. A detailed analysis of the time delay-induced emergence of extreme events is illustrated using bifurcation analysis, two-parameter phase diagrams, return maps, temporal plots, and probability density functions. The reasons behind the advent of extreme events are discussed in detail, with possible analogies to simplified two nodes of gene regulatory networks. The occasional large-amplitude bursting originated in the system via interior crisis-induced intermittency, Pomeau-Manneville intermittency, and the breakdown of quasiperiodic intermittency routes. Additionally, we have used various recurrence quantification statistical measures, such as mean recurrence time, determinism, and recurrence time entropy, to describe the transition from periodic or chaotic to unforeseen large deviations. Our approach shows that the sudden surge of variance and mean recurrence time at the transition points can be used as a new metric to detect the critical transitions of distinct extreme bursting events. The comprehensive overview of the interaction between gene regulatory networks, with insights into the formation of unusual dynamics, is beneficial to grasping different neuronal diseases.
Collapse
Affiliation(s)
- S Vinoth
- Center for Nonlinear and Complex Networks, SRM Institute of Science and Technology, Ramapuram, Chennai, 600 089, India
- Center for Research, SRM TRP Engineering College, Tiruchirappalli, Tamil Nadu, India
| | - S Leo Kingston
- Division of Dynamics, Lodz University of Technology, Stefanowskiego 1/15, 90-924, Lodz, Poland.
| | - Sabarathinam Srinivasan
- Department of Molecular Analytics, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences (SIMATS), Chennai, Tamilnadu, India
| | - Suresh Kumarasamy
- Centre for Artificial Intelligence, Easwari Engineering College, Chennai, 600 089, India.
- Center for Cognitive Science, Trichy SRM Medical College Hospital and Research Center, Trichy, India.
| | - Tomasz Kapitaniak
- Division of Dynamics, Lodz University of Technology, Stefanowskiego 1/15, 90-924, Lodz, Poland
| |
Collapse
|
3
|
Dong H, Ma B, Meng Y, Wu Y, Liu Y, Zeng T, Huang J. GRNMOPT: Inference of gene regulatory networks based on a multi-objective optimization approach. Comput Biol Chem 2024; 113:108223. [PMID: 39340962 DOI: 10.1016/j.compbiolchem.2024.108223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/21/2024] [Accepted: 09/20/2024] [Indexed: 09/30/2024]
Abstract
BACKGROUND AND OBJECTIVE The reconstruction of gene regulatory networks (GRNs) stands as a vital approach in deciphering complex biological processes. The application of nonlinear ordinary differential equations (ODEs) models has demonstrated considerable efficacy in predicting GRNs. Notably, the decay rate and time delay are pivotal in authentic gene regulation, yet their systematic determination in ODEs models remains underexplored. The development of a comprehensive optimization framework for the effective estimation of these key parameters is essential for accurate GRN inference. METHOD This study introduces GRNMOPT, an innovative methodology for inferring GRNs from time-series and steady-state data. GRNMOPT employs a combined use of decay rate and time delay in constructing ODEs models to authentically represent gene regulatory processes. It incorporates a multi-objective optimization approach, optimizing decay rate and time delay concurrently to derive Pareto optimal sets for these factors, thereby maximizing accuracy metrics such as AUROC (Area Under the Receiver Operating Characteristic curve) and AUPR (Area Under the Precision-Recall curve). Additionally, the use of XGBoost for calculating feature importance aids in identifying potential regulatory gene links. RESULTS Comprehensive experimental evaluations on two simulated datasets from DREAM4 and three real gene expression datasets (Yeast, In vivo Reverse-engineering and Modeling Assessment [IRMA], and Escherichia coli [E. coli]) reveal that GRNMOPT performs commendably across varying network scales. Furthermore, cross-validation experiments substantiate the robustness of GRNMOPT. CONCLUSION We propose a novel approach called GRNMOPT to infer GRNs based on a multi-objective optimization framework, which effectively improves inference accuracy and provides a powerful tool for GRNs inference.
Collapse
Affiliation(s)
- Heng Dong
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Baoshan Ma
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China.
| | - Yangyang Meng
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Yiming Wu
- School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Yongjing Liu
- Biomedical big data center, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China; Zhejiang Provincial Key Laboratory of Pancreatic Disease, Zhejiang University School of Medicine First Affiliated Hospital, Hangzhou 310003, China; Zhejiang University Cancer Center, Zhejiang University, Hangzhou 310058, China
| | - Tao Zeng
- Biomedical big data center, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China; Zhejiang Provincial Key Laboratory of Pancreatic Disease, Zhejiang University School of Medicine First Affiliated Hospital, Hangzhou 310003, China; Zhejiang University Cancer Center, Zhejiang University, Hangzhou 310058, China
| | - Jinyan Huang
- Biomedical big data center, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China; Zhejiang Provincial Key Laboratory of Pancreatic Disease, Zhejiang University School of Medicine First Affiliated Hospital, Hangzhou 310003, China; Zhejiang University Cancer Center, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
4
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
5
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
6
|
Emadi M, Boroujeni FZ, Pirgazi J. Improved Fuzzy Cognitive Maps for Gene Regulatory Networks Inference Based on Time Series Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1816-1829. [PMID: 38963747 DOI: 10.1109/tcbb.2024.3423383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/06/2024]
Abstract
Microarray data provide lots of information regarding gene expression levels. Due to the large amount of such data, their analysis requires sufficient computational methods for identifying and analyzing gene regulation networks; however, researchers in this field are faced with numerous challenges such as consideration for too many genes and at the same time, the limited number of samples and their noisy nature of the data. In this paper, a hybrid method base on fuzzy cognitive map and compressed sensing is used to identify interactions between genes. For this purpose, in inference of the gene regulation network, the Ensemble Kalman filtered compressed sensing is used to learn the fuzzy cognitive map. Using the Ensemble Kalman filter and compressed sensing, the fuzzy cognitive map will be robust against noise. The proposed algorithm is evaluated using several metrics and compared with several well know methods such as LASSOFCM, KFRegular, CMI2NI. The experimental results show that the proposed method outperforms methods proposed in recent years in terms of SSmean, Data Error and accuracy.
Collapse
|
7
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
8
|
Wei PJ, Bao JJ, Gao Z, Tan JY, Cao RF, Su Y, Zheng CH, Deng L. MEFFGRN: Matrix enhancement and feature fusion-based method for reconstructing the gene regulatory network of epithelioma papulosum cyprini cells by spring viremia of carp virus infection. Comput Biol Med 2024; 179:108835. [PMID: 38996550 DOI: 10.1016/j.compbiomed.2024.108835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jin-Jin Bao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jing-Yun Tan
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Li Deng
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
9
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
10
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 PMCID: PMC11444527 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
11
|
Wei PJ, Guo Z, Gao Z, Ding Z, Cao RF, Su Y, Zheng CH. Inference of gene regulatory networks based on directed graph convolutional networks. Brief Bioinform 2024; 25:bbae309. [PMID: 38935070 PMCID: PMC11209731 DOI: 10.1093/bib/bbae309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 05/17/2024] [Indexed: 06/28/2024] Open
Abstract
Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Ziqiang Guo
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Zheng Ding
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| | - Chun-Hou Zheng
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Anhui, China
| |
Collapse
|
12
|
Gao Z, Su Y, Xia J, Cao RF, Ding Y, Zheng CH, Wei PJ. DeepFGRN: inference of gene regulatory network with regulation type based on directed graph embedding. Brief Bioinform 2024; 25:bbae143. [PMID: 38581416 PMCID: PMC10998536 DOI: 10.1093/bib/bbae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/02/2024] [Accepted: 03/15/2024] [Indexed: 04/08/2024] Open
Abstract
The inference of gene regulatory networks (GRNs) from gene expression profiles has been a key issue in systems biology, prompting many researchers to develop diverse computational methods. However, most of these methods do not reconstruct directed GRNs with regulatory types because of the lack of benchmark datasets or defects in the computational methods. Here, we collect benchmark datasets and propose a deep learning-based model, DeepFGRN, for reconstructing fine gene regulatory networks (FGRNs) with both regulation types and directions. In addition, the GRNs of real species are always large graphs with direction and high sparsity, which impede the advancement of GRN inference. Therefore, DeepFGRN builds a node bidirectional representation module to capture the directed graph embedding representation of the GRN. Specifically, the source and target generators are designed to learn the low-dimensional dense embedding of the source and target neighbors of a gene, respectively. An adversarial learning strategy is applied to iteratively learn the real neighbors of each gene. In addition, because the expression profiles of genes with regulatory associations are correlative, a correlation analysis module is designed. Specifically, this module not only fully extracts gene expression features, but also captures the correlation between regulators and target genes. Experimental results show that DeepFGRN has a competitive capability for both GRN and FGRN inference. Potential biomarkers and therapeutic drugs for breast cancer, liver cancer, lung cancer and coronavirus disease 2019 are identified based on the candidate FGRNs, providing a possible opportunity to advance our knowledge of disease treatments.
Collapse
Affiliation(s)
- Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Junfeng Xia
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Rui-Fen Cao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yun Ding
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| |
Collapse
|
13
|
Gorin G, Vastola JJ, Pachter L. Studying stochastic systems biology of the cell with single-cell genomics data. Cell Syst 2023; 14:822-843.e22. [PMID: 37751736 PMCID: PMC10725240 DOI: 10.1016/j.cels.2023.08.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 08/16/2023] [Accepted: 08/25/2023] [Indexed: 09/28/2023]
Abstract
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - John J Vastola
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA.
| |
Collapse
|
14
|
Gao Z, Tang J, Xia J, Zheng CH, Wei PJ. CNNGRN: A Convolutional Neural Network-Based Method for Gene Regulatory Network Inference From Bulk Time-Series Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2853-2861. [PMID: 37267145 DOI: 10.1109/tcbb.2023.3282212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Gene regulatory networks (GRNs) participate in many biological processes, and reconstructing them plays an important role in systems biology. Although many advanced methods have been proposed for GRN reconstruction, their predictive performance is far from the ideal standard, so it is urgent to design a more effective method to reconstruct GRN. Moreover, most methods only consider the gene expression data, ignoring the network structure information contained in GRN. In this study, we propose a supervised model named CNNGRN, which infers GRN from bulk time-series expression data via convolutional neural network (CNN) model, with a more informative feature. Bulk time series gene expression data imply the intricate regulatory associations between genes, and the network structure feature of ground-truth GRN contains rich neighbor information. Hence, CNNGRN integrates the above two features as model inputs. In addition, CNN is adopted to extract intricate features of genes and infer the potential associations between regulators and target genes. Moreover, feature importance visualization experiments are implemented to seek the key features. Experimental results show that CNNGRN achieved competitive performance on benchmark datasets compared to the state-of-the-art computational methods. Finally, hub genes identified based on CNNGRN have been confirmed to be involved in biological processes through literature.
Collapse
|
15
|
Gorin G, Vastola JJ, Pachter L. Studying stochastic systems biology of the cell with single-cell genomics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.17.541250. [PMID: 37292934 PMCID: PMC10245677 DOI: 10.1101/2023.05.17.541250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, 91125
| | - John J. Vastola
- Department of Neurobiology, Harvard Medical School, Boston, MA, 02115
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125
| |
Collapse
|
16
|
Saint-Antoine M, Singh A. Benchmarking Gene Regulatory Network Inference Methods on Simulated and Experimental Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.12.540581. [PMID: 37215029 PMCID: PMC10197678 DOI: 10.1101/2023.05.12.540581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Although the challenge of gene regulatory network inference has been studied for more than a decade, it is still unclear how well network inference methods work when applied to real data. Attempts to benchmark these methods on experimental data have yielded mixed results, in which sometimes even the best methods fail to outperform random guessing, and in other cases they perform reasonably well. So, one of the most valuable contributions one can currently make to the field of network inference is to benchmark methods on experimental data for which the true underlying network is already known, and report the results so that we can get a clearer picture of their efficacy. In this paper, we report results from the first, to our knowledge, benchmarking of network inference methods on single cell E. coli transcriptomic data. We report a moderate level of accuracy for the methods, better than random chance but still far from perfect. We also find that some methods that were quite strong and accurate on microarray and bulk RNA-seq data did not perform as well on the single cell data. Additionally, we benchmark a simple network inference method (Pearson correlation), on data generated through computer simulations in order to draw conclusions about general best practices in network inference studies. We predict that network inference would be more accurate using proteomic data rather than transcriptomic data, which could become relevant if high-throughput proteomic experimental methods are developed in the future. We also show through simulations that using a simplified model of gene expression that skips the mRNA step tends to substantially overestimate the accuracy of network inference methods, and advise against using this model for future in silico benchmarking studies.
Collapse
Affiliation(s)
- Michael Saint-Antoine
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE USA 19716
| | - Abhyudai Singh
- Department of Electrical and Computer Engineering, Biomedical Engineering, Mathematical Sciences, Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE USA 19716
| |
Collapse
|
17
|
Franchini M, Pellecchia S, Viscido G, Gambardella G. Single-cell gene set enrichment analysis and transfer learning for functional annotation of scRNA-seq data. NAR Genom Bioinform 2023; 5:lqad024. [PMID: 36879897 PMCID: PMC9985338 DOI: 10.1093/nargab/lqad024] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/07/2023] Open
Abstract
Although an essential step, cell functional annotation often proves particularly challenging from single-cell transcriptional data. Several methods have been developed to accomplish this task. However, in most cases, these rely on techniques initially developed for bulk RNA sequencing or simply make use of marker genes identified from cell clustering followed by supervised annotation. To overcome these limitations and automatize the process, we have developed two novel methods, the single-cell gene set enrichment analysis (scGSEA) and the single-cell mapper (scMAP). scGSEA combines latent data representations and gene set enrichment scores to detect coordinated gene activity at single-cell resolution. scMAP uses transfer learning techniques to re-purpose and contextualize new cells into a reference cell atlas. Using both simulated and real datasets, we show that scGSEA effectively recapitulates recurrent patterns of pathways' activity shared by cells from different experimental conditions. At the same time, we show that scMAP can reliably map and contextualize new single-cell profiles on a breast cancer atlas we recently released. Both tools are provided in an effective and straightforward workflow providing a framework to determine cell function and significantly improve annotation and interpretation of scRNA-seq data.
Collapse
Affiliation(s)
- Melania Franchini
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy.,Department of Electrical Engineering and Information Technologies, University of Naples Federico II, 80125 Naples, Italy
| | - Simona Pellecchia
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy
| | - Gaetano Viscido
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy
| | - Gennaro Gambardella
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy.,Department of Chemical Materials and Industrial Engineering, University of Naples Federico II, 80125 Naples, Italy
| |
Collapse
|
18
|
Yan J, Wang X. Machine learning bridges omics sciences and plant breeding. TRENDS IN PLANT SCIENCE 2023; 28:199-210. [PMID: 36153276 DOI: 10.1016/j.tplants.2022.08.018] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/15/2022] [Accepted: 08/23/2022] [Indexed: 06/16/2023]
Abstract
Some of the biological knowledge obtained from fundamental research will be implemented in applied plant breeding. To bridge basic research and breeding practice, machine learning (ML) holds great promise to translate biological knowledge and omics data into precision-designed plant breeding. Here, we review ML for multi-omics analysis in plants, including data dimensionality reduction, inference of gene-regulation networks, and gene discovery and prioritization. These applications will facilitate understanding trait regulation mechanisms and identifying target genes potentially applicable to knowledge-driven molecular design breeding. We also highlight applications of deep learning in plant phenomics and ML in genomic selection-assisted breeding, such as various ML algorithms that model the correlations among genotypes (genes), phenotypes (traits), and environments, to ultimately achieve data-driven genomic design breeding.
Collapse
Affiliation(s)
- Jun Yan
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China.
| |
Collapse
|
19
|
Liu Q, Li J, Dong M, Liu M, Chai Y. Identification of Gene Regulatory Networks Using Variational Bayesian Inference in the Presence of Missing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:399-409. [PMID: 35061589 DOI: 10.1109/tcbb.2022.3144418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The identification of gene regulatory networks (GRN) from gene expression time series data is a challenge and open problem in system biology. This paper considers the structure inference of GRN from the incomplete and noisy gene expression data, which is a not well-studied issue for GRN inference. In this paper, the dynamical behavior of the gene expression process is described by a stochastic nonlinear state-space model with unknown noise information. A variational Bayesian (VB) framework are proposed to estimate the parameters and gene expression levels simultaneously. One of the advantages of this method is that it can easily handle the missing observations by generating the prediction values. Considering the sparsity of GRN, the smoothed gene data are modeled by the extreme gradient boosting tree, and the regulatory interactions among genes are identified by the importance scores based on the tree model. The proposed method is tested on the artificial DREAM4 datasets and one real gene expression dataset of yeast. The comparative results show that the proposed method can effectively recover the regulatory interactions of GRN in the presence of missing observations and outperforms the existing methods for GRN identification.
Collapse
|
20
|
Inference of gene regulatory networks based on the Light Gradient Boosting Machine. Comput Biol Chem 2022; 101:107769. [DOI: 10.1016/j.compbiolchem.2022.107769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/12/2022] [Accepted: 09/06/2022] [Indexed: 11/23/2022]
|
21
|
Cai M, Vesely A, Chen X, Li L, Goeman JJ. NetTDP: permutation-based true discovery proportions for differential co-expression network analysis. Brief Bioinform 2022; 23:6754043. [PMID: 36209415 DOI: 10.1093/bib/bbac417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/23/2022] [Accepted: 08/28/2022] [Indexed: 12/14/2022] Open
Abstract
Existing methods for differential network analysis could only infer whether two networks of interest have differences between two groups of samples, but could not quantify and localize network differences. In this work, a novel method, permutation-based Network True Discovery Proportions (NetTDP), is proposed to quantify the number of edges (correlations) or nodes (genes) for which the co-expression networks are different. In the NetTDP method, we propose an edge-level statistic and a node-level statistic, and detect true discoveries of edges and nodes in the sense of differential co-expression network, respectively, by the permutation-based sumSome method. Furthermore, the NetTDP method could further localize the differences by inferring the TDPs for edge or gene subsets of interest, which can be selected post hoc. Our NetTDP method allows inference on data-driven modules or biology-driven gene sets, and remains valid even when these sub-networks are optimized using the same data. Experimental results on both simulation data sets and five real data sets show the effectiveness of the proposed method in inferring the quantification and localization of differential co-expression networks. The R code is available at https://github.com/LiminLi-xjtu/NetTDP.
Collapse
Affiliation(s)
- Menglan Cai
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West, 710049, Shaanxi, China
| | - Anna Vesely
- Department of Statistical Sciences, University of Padova, Italy
| | - Xu Chen
- Department of Biomedical Data Sciences, Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Limin Li
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West, 710049, Shaanxi, China
| | - Jelle J Goeman
- Department of Biomedical Data Sciences, Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| |
Collapse
|
22
|
Yang B, Bao W, Chen B. PGRNIG: novel parallel gene regulatory network identification algorithm based on GPU. Brief Funct Genomics 2022; 21:441-454. [PMID: 36064791 DOI: 10.1093/bfgp/elac028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 07/30/2022] [Accepted: 08/03/2022] [Indexed: 12/14/2022] Open
Abstract
Molecular biology has revealed that complex life phenomena can be treated as the result of many gene interactions. Investigating these interactions and understanding the intrinsic mechanisms of biological systems using gene expression data have attracted a lot of attention. As a typical gene regulatory network (GRN) inference method, the S-system has been utilized to deal with small-scale network identification. However, it is extremely difficult to optimize it to infer medium-to-large networks. This paper proposes a novel parallel swarm intelligent algorithm, PGRNIG, to optimize the parameters of the S-system. We employed the clone selection strategy to improve the whale optimization algorithm (CWOA). To enhance the time efficiency of CWOA optimization, we utilized a parallel CWOA (PCWOA) based on the compute unified device architecture (CUDA) platform. Decomposition strategy and L1 regularization were utilized to reduce the search space and complexity of GRN inference. We applied the PGRNIG algorithm on three synthetic datasets and two real time-series expression datasets of the species of Escherichia coli and Saccharomyces cerevisiae. Experimental results show that PGRNIG could infer the gene regulatory network more accurately than other state-of-the-art methods with a convincing computational speed-up. Our findings show that CWOA and PCWOA have faster convergence performances than WOA.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou 221018, China
| | - Baitong Chen
- Xuzhou First People's Hospital, Xuzhou 221000, China
| |
Collapse
|
23
|
Chen G, Liu ZP. Inferring causal gene regulatory network via GreyNet: From dynamic grey association to causation. Front Bioeng Biotechnol 2022; 10:954610. [PMID: 36237217 PMCID: PMC9551017 DOI: 10.3389/fbioe.2022.954610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/15/2022] [Indexed: 11/23/2022] Open
Abstract
Gene regulatory network (GRN) provides abundant information on gene interactions, which contributes to demonstrating pathology, predicting clinical outcomes, and identifying drug targets. Existing high-throughput experiments provide rich time-series gene expression data to reconstruct the GRN to further gain insights into the mechanism of organisms responding to external stimuli. Numerous machine-learning methods have been proposed to infer gene regulatory networks. Nevertheless, machine learning, especially deep learning, is generally a “black box,” which lacks interpretability. The causality has not been well recognized in GRN inference procedures. In this article, we introduce grey theory integrated with the adaptive sliding window technique to flexibly capture instant gene–gene interactions in the uncertain regulatory system. Then, we incorporate generalized multivariate Granger causality regression methods to transform the dynamic grey association into causation to generate directional regulatory links. We evaluate our model on the DREAM4 in silico benchmark dataset and real-world hepatocellular carcinoma (HCC) time-series data. We achieved competitive results on the DREAM4 compared with other state-of-the-art algorithms and gained meaningful GRN structure on HCC data respectively.
Collapse
Affiliation(s)
- Guangyi Chen
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
- Center for Intelligent Medicine, Shandong University, Jinan, Shandong, China
- *Correspondence: Zhi-Ping Liu,
| |
Collapse
|
24
|
Caligola S, De Sanctis F, Canè S, Ugel S. Breaking the Immune Complexity of the Tumor Microenvironment Using Single-Cell Technologies. Front Genet 2022; 13:867880. [PMID: 35651929 PMCID: PMC9149246 DOI: 10.3389/fgene.2022.867880] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/27/2022] [Indexed: 12/31/2022] Open
Abstract
Tumors are not a simple aggregate of transformed cells but rather a complicated ecosystem containing various components, including infiltrating immune cells, tumor-related stromal cells, endothelial cells, soluble factors, and extracellular matrix proteins. Profiling the immune contexture of this intricate framework is now mandatory to develop more effective cancer therapies and precise immunotherapeutic approaches by identifying exact targets or predictive biomarkers, respectively. Conventional technologies are limited in reaching this goal because they lack high resolution. Recent developments in single-cell technologies, such as single-cell RNA transcriptomics, mass cytometry, and multiparameter immunofluorescence, have revolutionized the cancer immunology field, capturing the heterogeneity of tumor-infiltrating immune cells and the dynamic complexity of tenets that regulate cell networks in the tumor microenvironment. In this review, we describe some of the current single-cell technologies and computational techniques applied for immune-profiling the cancer landscape and discuss future directions of how integrating multi-omics data can guide a new "precision oncology" advancement.
Collapse
Affiliation(s)
| | | | | | - Stefano Ugel
- Immunology Section, Department of Medicine, University of Verona, Verona, Italy
| |
Collapse
|
25
|
Saremi M, Amirmazlaghani M. Reconstruction of Gene Regulatory Networks Using Multiple Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1827-1839. [PMID: 33539303 DOI: 10.1109/tcbb.2021.3057241] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
MOTIVATION Laboratory gene regulatory data for a species are sporadic. Despite the abundance of gene regulatory network algorithms that employ single data sets, few algorithms can combine the vast but disperse sources of data and extract the potential information. With a motivation to compensate for this shortage, we developed an algorithm called GENEREF that can accumulate information from multiple types of data sets in an iterative manner, with each iteration boosting the performance of the prediction results. RESULTS The algorithm is examined extensively on data extracted from the quintuple DREAM4 networks and DREAM5's Escherichia coli and Saccharomyces cerevisiae networks and sub-networks. Many single-dataset and multi-dataset algorithms were compared to test the performance of the algorithm. Results show that GENEREF surpasses non-ensemble state-of-the-art multi-perturbation algorithms on the selected networks and is competitive to present multiple-dataset algorithms. Specifically, it outperforms dynGENIE3 and is on par with iRafNet. Also, we argued that a scoring method solely based on the AUPR criterion would be more trustworthy than the traditional score. AVAILABILITY The Python implementation along with the data sets and results can be downloaded from github.com/msaremi/GENEREF.
Collapse
|
26
|
Zhang Y, He Y, Chen Q, Yang Y, Gong M. Fusion prior gene network for high reliable single-cell gene regulatory network inference. Comput Biol Med 2022; 143:105279. [PMID: 35134605 DOI: 10.1016/j.compbiomed.2022.105279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 01/25/2022] [Accepted: 01/29/2022] [Indexed: 11/03/2022]
Abstract
Single-Cell RNA sequencing technology provides an opportunity to discover gene regulatory networks(GRN) that control cell differentiation and drive cell type transformation. However, it is faced with the challenge of high loss and high noise of sequencing data and contains many pseudo-connections. To solve these problems, we propose a framework called Fusion prior gene network for Gene Regulatory Network inference Accuracy Enhancement(FGRNAE) to infer a high reliable gene regulatory network. Specifically, based on the Single-Cell RNA-sequencing Network Propagation and network Fusion(scNPF) preprocessing framework, we employ the Random Walk with Restart on the prior gene network to interpolate the missing data. Furthermore, we infer the network using the Random Forest algorithm with the results achieved above. In addition, we apply data from the Co-Function Network to build a meta-gene network and select the regulatory connection with the Markov Random Field. Extensive experiments based on datasets from BEELINE validate the effectiveness of our framework for improving the accuracy of inference.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China; School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yuchen He
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Qingyuan Chen
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yihan Yang
- International College, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Meiqin Gong
- West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
27
|
Feng H, Zheng R, Wang J, Wu FX, Li M. NIMCE: A Gene Regulatory Network Inference Approach Based on Multi Time Delays Causal Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1042-1049. [PMID: 33035155 DOI: 10.1109/tcbb.2020.3029846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene regulatory networks (GRNs)are involved in various biological processes, such as cell cycle, differentiation and apoptosis. The existing large amount of expression data, especially the time-series expression data, provide a chance to infer GRNs by computational methods. These data can reveal the dynamics of gene expression and imply the regulatory relationships among genes. However, identify the indirect regulatory links is still a big challenge as most studies treat time points as independent observations, while ignoring the influences of time delays. In this study, we propose a GRN inference method based on information-theory measure, called NIMCE. NIMCE incorporates the transfer entropy to measure the regulatory links between each pair of genes, then applies the causation entropy to filter indirect relationships. In addition, NIMCE applies multi time delays to identify indirect regulatory relationships from candidate genes. Experiments on simulated and colorectal cancer data show NIMCE outperforms than other competing methods. All data and codes used in this study are publicly available at https://github.com/CSUBioGroup/NIMCE.
Collapse
|
28
|
Deshpande A, Chu LF, Stewart R, Gitter A. Network inference with Granger causality ensembles on single-cell transcriptomics. Cell Rep 2022; 38:110333. [PMID: 35139376 PMCID: PMC9093087 DOI: 10.1016/j.celrep.2022.110333] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 02/19/2021] [Accepted: 01/12/2022] [Indexed: 12/20/2022] Open
Abstract
Cellular gene expression changes throughout a dynamic biological process, such as differentiation. Pseudotimes estimate cells' progress along a dynamic process based on their individual gene expression states. Ordering the expression data by pseudotime provides information about the underlying regulator-gene interactions. Because the pseudotime distribution is not uniform, many standard mathematical methods are inapplicable for analyzing the ordered gene expression states. Here we present single-cell inference of networks using Granger ensembles (SINGE), an algorithm for gene regulatory network inference from ordered single-cell gene expression data. SINGE uses kernel-based Granger causality regression to smooth irregular pseudotimes and missing expression values. It aggregates predictions from an ensemble of regression analyses to compile a ranked list of candidate interactions between transcriptional regulators and target genes. In two mouse embryonic stem cell differentiation datasets, SINGE outperforms other contemporary algorithms. However, a more detailed examination reveals caveats about poor performance for individual regulators and uninformative pseudotimes.
Collapse
Affiliation(s)
- Atul Deshpande
- Department of Electrical and Computer Engineering, University of Wisconsin - Madison, Madison, WI 53706, USA; Morgridge Institute for Research, Madison, WI 53715, USA
| | - Li-Fang Chu
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Anthony Gitter
- Morgridge Institute for Research, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53792, USA.
| |
Collapse
|
29
|
Shrivastava H, Zhang X, Song L, Aluru S. GRNUlar: A Deep Learning Framework for Recovering Single-Cell Gene Regulatory Networks. J Comput Biol 2022; 29:27-44. [PMID: 35050715 DOI: 10.1089/cmb.2021.0437] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
We propose GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single-cell RNA-Sequencing (scRNA-Seq) data. Our framework incorporates two intertwined models. First, we leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresponding genes they regulate, by developing a multitask learning framework. Second, to capture sparsity of GRNs observed in the real world, we design an unrolled algorithm technique for our framework. Our deep architecture requires supervision for training, for which we repurpose existing synthetic data simulators that generate scRNA-Seq data guided by an underlying GRN. Experimental results demonstrate that GRNUlar outperforms state-of-the-art methods on both synthetic and real data sets. Our study also demonstrates the novel and successful use of expression data simulators for supervised learning of GRN inference.
Collapse
Affiliation(s)
- Harsh Shrivastava
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Xiuwei Zhang
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Le Song
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Srinivas Aluru
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| |
Collapse
|
30
|
Davis-Marcisak EF, Deshpande A, Stein-O'Brien GL, Ho WJ, Laheru D, Jaffee EM, Fertig EJ, Kagohara LT. From bench to bedside: Single-cell analysis for cancer immunotherapy. Cancer Cell 2021; 39:1062-1080. [PMID: 34329587 PMCID: PMC8406623 DOI: 10.1016/j.ccell.2021.07.004] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 06/16/2021] [Accepted: 07/02/2021] [Indexed: 01/04/2023]
Abstract
Single-cell technologies are emerging as powerful tools for cancer research. These technologies characterize the molecular state of each cell within a tumor, enabling new exploration of tumor heterogeneity, microenvironment cell-type composition, and cell state transitions that affect therapeutic response, particularly in the context of immunotherapy. Analyzing clinical samples has great promise for precision medicine but is technically challenging. Successfully identifying predictors of response requires well-coordinated, multi-disciplinary teams to ensure adequate sample processing for high-quality data generation and computational analysis for data interpretation. Here, we review current approaches to sample processing and computational analysis regarding their application to translational cancer immunotherapy research.
Collapse
Affiliation(s)
- Emily F Davis-Marcisak
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, 550 N Broadway, Suite 1101E, Baltimore, MD 21205, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Atul Deshpande
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Genevieve L Stein-O'Brien
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, 550 N Broadway, Suite 1101E, Baltimore, MD 21205, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Won J Ho
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Daniel Laheru
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Elizabeth M Jaffee
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Elana J Fertig
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, 550 N Broadway, Suite 1101E, Baltimore, MD 21205, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD, USA; Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Luciane T Kagohara
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
31
|
Mousavi R, Konuru SH, Lobo D. Inference of dynamic spatial GRN models with multi-GPU evolutionary computation. Brief Bioinform 2021; 22:6217729. [PMID: 33834216 DOI: 10.1093/bib/bbab104] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 02/15/2021] [Accepted: 03/09/2021] [Indexed: 02/06/2023] Open
Abstract
Reverse engineering mechanistic gene regulatory network (GRN) models with a specific dynamic spatial behavior is an inverse problem without analytical solutions in general. Instead, heuristic machine learning algorithms have been proposed to infer the structure and parameters of a system of equations able to recapitulate a given gene expression pattern. However, these algorithms are computationally intensive as they need to simulate millions of candidate models, which limits their applicability and requires high computational resources. Graphics processing unit (GPU) computing is an affordable alternative for accelerating large-scale scientific computation, yet no method is currently available to exploit GPU technology for the reverse engineering of mechanistic GRNs from spatial phenotypes. Here we present an efficient methodology to parallelize evolutionary algorithms using GPU computing for the inference of mechanistic GRNs that can develop a given gene expression pattern in a multicellular tissue area or cell culture. The proposed approach is based on multi-CPU threads running the lightweight crossover, mutation and selection operators and launching GPU kernels asynchronously. Kernels can run in parallel in a single or multiple GPUs and each kernel simulates and scores the error of a model using the thread parallelism of the GPU. We tested this methodology for the inference of spatiotemporal mechanistic gene regulatory networks (GRNs)-including topology and parameters-that can develop a given 2D gene expression pattern. The results show a 700-fold speedup with respect to a single CPU implementation. This approach can streamline the extraction of knowledge from biological and medical datasets and accelerate the automatic design of GRNs for synthetic biology applications.
Collapse
Affiliation(s)
- Reza Mousavi
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| | - Sri Harsha Konuru
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences at the University of Maryland, Baltimore, MD 21250, USA
| |
Collapse
|
32
|
|
33
|
Pirgazi J, Olyaee MH, Khanteymoori A. KFGRNI: A robust method to inference gene regulatory network from time-course gene data based on ensemble Kalman filter. J Bioinform Comput Biol 2021; 19:2150002. [PMID: 33657986 DOI: 10.1142/s0219720021500025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A central problem of systems biology is the reconstruction of Gene Regulatory Networks (GRNs) by the use of time series data. Although many attempts have been made to design an efficient method for GRN inference, providing a best solution is still a challenging task. Existing noise, low number of samples, and high number of nodes are the main reasons causing poor performance of existing methods. The present study applies the ensemble Kalman filter algorithm to model a GRN from gene time series data. The inference of a GRN is decomposed with p genes into p subproblems. In each subproblem, the ensemble Kalman filter algorithm identifies the weight of interactions for each target gene. With the use of the ensemble Kalman filter, the expression pattern of the target gene is predicted from the expression patterns of all the remaining genes. The proposed method is compared with several well-known approaches. The results of the evaluation indicate that the proposed method improves inference accuracy and demonstrates better regulatory relations with noisy data.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran Behshahr, Iran
| | - Mohammad Hossein Olyaee
- Department of Computer Engineering, Engineering Faculty, University of Gonabad, Gonabad, Iran
| | - Alireza Khanteymoori
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany.,Department of Computer Engineering, Engineering Faculty, University of Zanjan Zanjan Province, Iran
| |
Collapse
|
34
|
Sun X, Zhang J, Nie Q. Inferring latent temporal progression and regulatory networks from cross-sectional transcriptomic data of cancer samples. PLoS Comput Biol 2021; 17:e1008379. [PMID: 33667222 PMCID: PMC7968745 DOI: 10.1371/journal.pcbi.1008379] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 03/17/2021] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Unraveling molecular regulatory networks underlying disease progression is critically important for understanding disease mechanisms and identifying drug targets. The existing methods for inferring gene regulatory networks (GRNs) rely mainly on time-course gene expression data. However, most available omics data from cross-sectional studies of cancer patients often lack sufficient temporal information, leading to a key challenge for GRN inference. Through quantifying the latent progression using random walks-based manifold distance, we propose a latent-temporal progression-based Bayesian method, PROB, for inferring GRNs from the cross-sectional transcriptomic data of tumor samples. The robustness of PROB to the measurement variabilities in the data is mathematically proved and numerically verified. Performance evaluation on real data indicates that PROB outperforms other methods in both pseudotime inference and GRN inference. Applications to bladder cancer and breast cancer demonstrate that our method is effective to identify key regulators of cancer progression or drug targets. The identified ACSS1 is experimentally validated to promote epithelial-to-mesenchymal transition of bladder cancer cells, and the predicted FOXM1-targets interactions are verified and are predictive of relapse in breast cancer. Our study suggests new effective ways to clinical transcriptomic data modeling for characterizing cancer progression and facilitates the translation of regulatory network-based approaches into precision medicine.
Collapse
Affiliation(s)
- Xiaoqiang Sun
- Key Laboratory of Tropical Disease Control, Chinese Ministry of Education; Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- School of Mathematics, Sun Yat-sen University, Guangzhou, China
| | - Ji Zhang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong, China
| | - Qing Nie
- Department of Mathematics and Department of Developmental & Cell Biology, NSF-Simons Center for Multiscale Cell Fate Research, University of California Irvine, Irvine, California, United States of America
| |
Collapse
|
35
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
36
|
|
37
|
Zheng R, Li M, Chen X, Zhao S, Wu FX, Pan Y, Wang J. An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:347-354. [PMID: 30794516 DOI: 10.1109/tcbb.2019.2900614] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gene regulatory networks (GRNs) play a key role in biological processes. However, GRNs are diverse under different biological conditions. Reconstructing gene regulatory networks (GRNs) from gene expression has become an important opportunity and challenge in the past decades. Although there are a lot of existing methods to infer the topology of GRNs, such as mutual information, random forest, and partial least squares, the accuracy is still low due to the noise and high dimension of the expression data. In this paper, we introduce an ensemble Multivariate Adaptive Regression Splines (MARS) based method to reconstruct the directed GRNs from multifactorial gene expression data, called PBMarsNet. PBMarsNet incorporates part mutual information (PMI) to pre-weight the candidate regulatory genes and then uses MARS to detect the nonlinear regulatory links. Moreover, we apply bootstrap to run the MARS multiple times and average the outputs of each MARS as the final score of regulatory links. The results on DREAM4 challenge and DREAM5 challenge datasets show PBMarsNet has a superior performance and generalization over other state-of-the-art methods.
Collapse
|
38
|
Lu J, Dumitrascu B, McDowell IC, Jo B, Barrera A, Hong LK, Leichter SM, Reddy TE, Engelhardt BE. Causal network inference from gene transcriptional time-series response to glucocorticoids. PLoS Comput Biol 2021; 17:e1008223. [PMID: 33513136 PMCID: PMC7875426 DOI: 10.1371/journal.pcbi.1008223] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 02/10/2021] [Accepted: 08/07/2020] [Indexed: 11/19/2022] Open
Abstract
Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.
Collapse
Affiliation(s)
- Jonathan Lu
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
| | - Bianca Dumitrascu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Ian C. McDowell
- Element Genomics, A UCB Company, Durham, North Carolina, United States of America
| | - Brian Jo
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Alejandro Barrera
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Linda K. Hong
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Sarah M. Leichter
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Timothy E. Reddy
- Department of Genome Sciences, Duke University, Durham, North Carolina, United States of America
| | - Barbara E. Engelhardt
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
39
|
Dai H, Jin QQ, Li L, Chen LN. Reconstructing gene regulatory networks in single-cell transcriptomic data analysis. Zool Res 2020; 41:599-604. [PMID: 33124218 PMCID: PMC7671911 DOI: 10.24272/j.issn.2095-8137.2020.215] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 10/20/2020] [Indexed: 11/07/2022] Open
Abstract
Gene regulatory networks play pivotal roles in our understanding of biological processes/mechanisms at the molecular level. Many studies have developed sample-specific or cell-type-specific gene regulatory networks from single-cell transcriptomic data based on a large amount of cell samples. Here, we review the state-of-the-art computational algorithms and describe various applications of gene regulatory networks in biological studies.
Collapse
Affiliation(s)
- Hao Dai
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- Institute of Brain-Intelligence Technology, Zhangjiang Laboratory, Shanghai 201210, China
| | - Qi-Qi Jin
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Lin Li
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Luo-Nan Chen
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, Zhejiang 310024, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| |
Collapse
|
40
|
Liu W, Sun X, Peng L, Zhou L, Lin H, Jiang Y. RWRNET: A Gene Regulatory Network Inference Algorithm Using Random Walk With Restart. Front Genet 2020; 11:591461. [PMID: 33101398 PMCID: PMC7545090 DOI: 10.3389/fgene.2020.591461] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/02/2020] [Indexed: 11/30/2022] Open
Abstract
Inferring gene regulatory networks from expression data is essential in identifying complex regulatory relationships among genes and revealing the mechanism of certain diseases. Various computation methods have been developed for inferring gene regulatory networks. However, these methods focus on the local topology of the network rather than on the global topology. From network optimisation standpoint, emphasising the global topology of the network also reduces redundant regulatory relationships. In this study, we propose a novel network inference algorithm using Random Walk with Restart (RWRNET) that combines local and global topology relationships. The method first captures the local topology through three elements of random walk and then combines the local topology with the global topology by Random Walk with Restart. The Markov Blanket discovery algorithm is then used to deal with isolated genes. The proposed method is compared with several state-of-the-art methods on the basis of six benchmark datasets. Experimental results demonstrated the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Lili Zhou
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Hui Lin
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Yi Jiang
- School of Computer Science, Xiangtan University, Xiangtan, China
| |
Collapse
|
41
|
Ma B, Fang M, Jiao X. Inference of gene regulatory networks based on nonlinear ordinary differential equations. Bioinformatics 2020; 36:4885-4893. [DOI: 10.1093/bioinformatics/btaa032] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/30/2019] [Accepted: 01/15/2020] [Indexed: 01/05/2023] Open
Abstract
Abstract
Motivation
Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks.
Results
In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity.
Availability and implementation
The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoshan Ma
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Mingkun Fang
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | - Xiangtian Jiao
- College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
42
|
Ko DK, Brandizzi F. Network-based approaches for understanding gene regulation and function in plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 104:302-317. [PMID: 32717108 PMCID: PMC8922287 DOI: 10.1111/tpj.14940] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Accepted: 07/14/2020] [Indexed: 05/03/2023]
Abstract
Expression reprogramming directed by transcription factors is a primary gene regulation underlying most aspects of the biology of any organism. Our views of how gene regulation is coordinated are dramatically changing thanks to the advent and constant improvement of high-throughput profiling and transcriptional network inference methods: from activities of individual genes to functional interactions across genes. These technical and analytical advances can reveal the topology of transcriptional networks in which hundreds of genes are hierarchically regulated by multiple transcription factors at systems level. Here we review the state of the art of experimental and computational methods used in plant biology research to obtain large-scale datasets and model transcriptional networks. Examples of direct use of these network models and perspectives on their limitations and future directions are also discussed.
Collapse
Affiliation(s)
- Dae Kwan Ko
- MSU-DOE Plant Research Lab, Michigan State University, East Lansing, MI 48824, USA
- Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
| | - Federica Brandizzi
- MSU-DOE Plant Research Lab, Michigan State University, East Lansing, MI 48824, USA
- Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- For correspondence ()
| |
Collapse
|
43
|
Møller AF, Natarajan KN. Predicting gene regulatory networks from cell atlases. Life Sci Alliance 2020; 3:3/11/e202000658. [PMID: 32958603 PMCID: PMC7536823 DOI: 10.26508/lsa.202000658] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 08/24/2020] [Accepted: 08/31/2020] [Indexed: 12/17/2022] Open
Abstract
Integrated single-cell gene regulatory network from three mouse cell atlases captures global and cell type–specific regulatory modules and crosstalk, important for cellular identity. Recent single-cell RNA-sequencing atlases have surveyed and identified major cell types across different mouse tissues. Here, we computationally reconstruct gene regulatory networks from three major mouse cell atlases to capture functional regulators critical for cell identity, while accounting for a variety of technical differences, including sampled tissues, sequencing depth, and author assigned cell type labels. Extracting the regulatory crosstalk from mouse atlases, we identify and distinguish global regulons active in multiple cell types from specialised cell type–specific regulons. We demonstrate that regulon activities accurately distinguish individual cell types, despite differences between individual atlases. We generate an integrated network that further uncovers regulon modules with coordinated activities critical for cell types, and validate modules using available experimental data. Inferring regulatory networks during myeloid differentiation from wild-type and Irf8 KO cells, we uncover functional contribution of Irf8 regulon activity and composition towards monocyte lineage. Our analysis provides an avenue to further extract and integrate the regulatory crosstalk from single-cell expression data.
Collapse
Affiliation(s)
- Andreas Fønss Møller
- Department of Biochemistry and Molecular Biology, Functional Genomics and Metabolism Unit, University of Southern Denmark, Odense, Denmark
| | - Kedar Nath Natarajan
- Department of Biochemistry and Molecular Biology, Functional Genomics and Metabolism Unit, University of Southern Denmark, Odense, Denmark .,Danish Institute of Advanced Study, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
44
|
Li Y, Ma A, Mathé EA, Li L, Liu B, Ma Q. Elucidation of Biological Networks across Complex Diseases Using Single-Cell Omics. Trends Genet 2020; 36:951-966. [PMID: 32868128 DOI: 10.1016/j.tig.2020.08.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 07/29/2020] [Accepted: 08/04/2020] [Indexed: 12/14/2022]
Abstract
Single-cell multimodal omics (scMulti-omics) technologies have made it possible to trace cellular lineages during differentiation and to identify new cell types in heterogeneous cell populations. The derived information is especially promising for computing cell-type-specific biological networks encoded in complex diseases and improving our understanding of the underlying gene regulatory mechanisms. The integration of these networks could, therefore, give rise to a heterogeneous regulatory landscape (HRL) in support of disease diagnosis and drug therapeutics. In this review, we provide an overview of this field and pay particular attention to how diverse biological networks can be inferred in a specific cell type based on integrative methods. Then, we discuss how HRL can advance our understanding of regulatory mechanisms underlying complex diseases and aid in the prediction of prognosis and therapeutic responses. Finally, we outline challenges and future trends that will be central to bringing the field of HRL in complex diseases forward.
Collapse
Affiliation(s)
- Yang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Ewy A Mathé
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health (NIH), Rockville, MD, 20892, USA
| | - Lang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
45
|
Che D, Guo S, Jiang Q, Chen L. PFBNet: a priori-fused boosting method for gene regulatory network inference. BMC Bioinformatics 2020; 21:308. [PMID: 32664870 PMCID: PMC7362553 DOI: 10.1186/s12859-020-03639-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 07/02/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Inferring gene regulatory networks (GRNs) from gene expression data remains a challenge in system biology. In past decade, numerous methods have been developed for the inference of GRNs. It remains a challenge due to the fact that the data is noisy and high dimensional, and there exists a large number of potential interactions. RESULTS We present a novel method, namely priori-fused boosting network inference method (PFBNet), to infer GRNs from time-series expression data by using the non-linear model of Boosting and the prior information (e.g., the knockout data) fusion scheme. Specifically, PFBNet first calculates the confidences of the regulation relationships using the boosting-based model, where the information about the accumulation impact of the gene expressions at previous time points is taken into account. Then, a newly defined strategy is applied to fuse the information from the prior data by elevating the confidences of the regulation relationships from the corresponding regulators. CONCLUSIONS The experiments on the benchmark datasets from DREAM challenge as well as the E.coli datasets show that PFBNet achieves significantly better performance than other state-of-the-art methods (Jump3, GEINE3-lag, HiDi, iRafNet and BiXGBoost).
Collapse
Affiliation(s)
- Dandan Che
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Shun Guo
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Qingshan Jiang
- Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518000 China
| | - Lifei Chen
- School of Mathematics and Computer Science, Fujian Normal University, Fujian, 350117 China
| |
Collapse
|
46
|
Saint-Antoine MM, Singh A. Network inference in systems biology: recent developments, challenges, and applications. Curr Opin Biotechnol 2020; 63:89-98. [PMID: 31927423 PMCID: PMC7308210 DOI: 10.1016/j.copbio.2019.12.002] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 12/03/2019] [Indexed: 12/12/2022]
Abstract
One of the most interesting, difficult, and potentially useful topics in computational biology is the inference of gene regulatory networks (GRNs) from expression data. Although researchers have been working on this topic for more than a decade and much progress has been made, it remains an unsolved problem and even the most sophisticated inference algorithms are far from perfect. In this paper, we review the latest developments in network inference, including state-of-the-art algorithms like PIDC, Phixer, and more. We also discuss unsolved computational challenges, including the optimal combination of algorithms, integration of multiple data sources, and pseudo-temporal ordering of static expression data. Lastly, we discuss some exciting applications of network inference in cancer research, and provide a list of useful software tools for researchers hoping to conduct their own network inference analyses.
Collapse
Affiliation(s)
- Michael M Saint-Antoine
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware 19716, USA
| | - Abhyudai Singh
- Electrical and Computer Engineering, University of Delaware, Newark, Delaware 19716, USA.
| |
Collapse
|
47
|
Tsai MJ, Wang JR, Ho SJ, Shu LS, Huang WL, Ho SY. GREMA: modelling of emulated gene regulatory networks with confidence levels based on evolutionary intelligence to cope with the underdetermined problem. Bioinformatics 2020; 36:3833-3840. [DOI: 10.1093/bioinformatics/btaa267] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 04/14/2020] [Accepted: 05/09/2020] [Indexed: 11/12/2022] Open
Abstract
AbstractMotivationNon-linear ordinary differential equation (ODE) models that contain numerous parameters are suitable for inferring an emulated gene regulatory network (eGRN). However, the number of experimental measurements is usually far smaller than the number of parameters of the eGRN model that leads to an underdetermined problem. There is no unique solution to the inference problem for an eGRN using insufficient measurements.ResultsThis work proposes an evolutionary modelling algorithm (EMA) that is based on evolutionary intelligence to cope with the underdetermined problem. EMA uses an intelligent genetic algorithm to solve the large-scale parameter optimization problem. An EMA-based method, GREMA, infers a novel type of gene regulatory network with confidence levels for every inferred regulation. The higher the confidence level is, the more accurate the inferred regulation is. GREMA gradually determines the regulations of an eGRN with confidence levels in descending order using either an S-system or a Hill function-based ODE model. The experimental results showed that the regulations with high-confidence levels are more accurate and robust than regulations with low-confidence levels. Evolutionary intelligence enhanced the mean accuracy of GREMA by 19.2% when using the S-system model with benchmark datasets. An increase in the number of experimental measurements may increase the mean confidence level of the inferred regulations. GREMA performed well compared with existing methods that have been previously applied to the same S-system, DREAM4 challenge and SOS DNA repair benchmark datasets.Availability and implementationAll of the datasets that were used and the GREMA-based tool are freely available at https://nctuiclab.github.io/GREMA.Supplementary informationSupplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ming-Ju Tsai
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Jyun-Rong Wang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Shinn-Jang Ho
- Department of Automation Engineering, National Formosa University, Yunlin 632, Taiwan
| | - Li-Sun Shu
- Department of Information Management, Overseas Chinese University, Taichung 407, Taiwan
| | - Wen-Lin Huang
- Department of Industrial Engineering and Management, Minghsin University of Science and Technology, Xinfeng 304, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology
- Center For Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
48
|
Zhang Y, Zhang X, Lane AN, Fan TWM, Liu J. Inferring Gene Regulatory Networks of Metabolic Enzymes Using Gradient Boosted Trees. IEEE J Biomed Health Inform 2020; 24:1528-1536. [DOI: 10.1109/jbhi.2019.2931997] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
49
|
Wang H, Lian Y, Li C, Ma Y, Yan Z, Dong C. SIN-KNO: A method of gene regulatory network inference using single-cell transcription and gene knockout data. J Bioinform Comput Biol 2020; 17:1950035. [PMID: 32019417 DOI: 10.1142/s0219720019500355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
As a tool of interpreting and analyzing genetic data, gene regulatory network (GRN) could reveal regulatory relationships between genes, proteins, and small molecules, as well as understand physiological activities and functions within biological cells, interact in pathways, and how to make changes in the organism. Traditional GRN research focuses on the analysis of the regulatory relationships through the average of cellular gene expressions. These methods are difficult to identify the cell heterogeneity of gene expression. Existing methods for inferring GRN using single-cell transcriptional data lack expression information when genes reach steady state, and the high dimensionality of single-cell data leads to high temporal and spatial complexity of the algorithm. In order to solve the problem in traditional GRN inference methods, including the lack of cellular heterogeneity information, single-cell data complexity and lack of steady-state information, we propose a method for GRN inference using single-cell transcription and gene knockout data, called SINgle-cell transcription data-KNOckout data (SIN-KNO), which focuses on combining dynamic and steady-state information of regulatory relationship contained in gene expression. Capturing cell heterogeneity information could help understand the gene expression difference in different cells. So, we could observe gene expression changes more accurately. Gene knockout data could observe the gene expression levels at steady-state of all other genes when one gene is knockout. Classifying the genes before analyzing the single-cell data could determine a large number of non-existent regulation, greatly reducing the number of regulation required for inference. In order to show the efficiency, the proposed method has been compared with several typical methods in this area including GENIE3, JUMP3, and SINCERITIES. The results of the evaluation indicate that the proposed method can analyze the diversified information contained in the two types of data, establish a more accurate gene regulation network, and improve the computational efficiency. The method provides a new thinking for dealing with large datasets and high computational complexity of single-cell data in the GRN inference.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Yuanyuan Lian
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Chun Li
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Yue Ma
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Zhiliang Yan
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Chunlin Dong
- Dryland Agriculture Research Center, Shanxi Academy of Agricultural Sciences, Taiyuan, Shanxi, China
| |
Collapse
|
50
|
Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods 2020; 17:147-154. [PMID: 31907445 PMCID: PMC7098173 DOI: 10.1038/s41592-019-0690-6] [Citation(s) in RCA: 387] [Impact Index Per Article: 77.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/22/2019] [Indexed: 01/10/2023]
Abstract
We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.
Collapse
Affiliation(s)
- Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Amogh P Jalihal
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Jeffrey N Law
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Aditya Bharadwaj
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|