1
|
Yang B, Bao W, Chen B, Song D. Single_cell_GRN: gene regulatory network identification based on supervised learning method and Single-cell RNA-seq data. BioData Min 2022; 15:13. [PMID: 35690842 PMCID: PMC9188720 DOI: 10.1186/s13040-022-00297-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/22/2022] [Indexed: 11/30/2022] Open
Abstract
Single-cell RNA-seq overcomes the shortcomings of conventional transcriptome sequencing technology and could provide a powerful tool for distinguishing the transcriptome characteristics of various cell types in biological tissues, and comprehensively revealing the heterogeneity of gene expression between cells. Many Intelligent Computing methods have been presented to infer gene regulatory network (GRN) with single-cell RNA-seq data. In this paper, we investigate the performances of seven classifiers including support vector machine (SVM), random forest (RF), Naive Bayesian (NB), GBDT, logical regression (LR), decision tree (DT) and K-Nearest Neighbor (KNN) for solving the binary classification problems of GRN inference with single-cell RNA-seq data (Single_cell_GRN). In SVM, three different kernel functions (linear, polynomial and radial basis function) are utilized, respectively. Three real single-cell RNA-seq datasets from mouse and human are utilized. The experiment results prove that in most cases supervised learning methods (SVM, RF, NB, GBDT, LR, DT and KNN) perform better than unsupervised learning method (GENIE3) in terms of AUC. SVM, RF and KNN have the better performances than other four classifiers. In SVM, linear and polynomial kernels are more fit to model single-cell RNA-seq data.
Collapse
Affiliation(s)
- Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China
| | - Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, 221018, China.
| | - Baitong Chen
- Xuzhou First People's Hospital, Xuzhou, 221000, China
| | - Dan Song
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China.
| |
Collapse
|
2
|
Alali M, Imani M. Inference of regulatory networks through temporally sparse data. FRONTIERS IN CONTROL ENGINEERING 2022; 3:1017256. [PMID: 36582942 PMCID: PMC9795458 DOI: 10.3389/fcteg.2022.1017256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A major goal in genomics is to properly capture the complex dynamical behaviors of gene regulatory networks (GRNs). This includes inferring the complex interactions between genes, which can be used for a wide range of genomics analyses, including diagnosis or prognosis of diseases and finding effective treatments for chronic diseases such as cancer. Boolean networks have emerged as a successful class of models for capturing the behavior of GRNs. In most practical settings, inference of GRNs should be achieved through limited and temporally sparse genomics data. A large number of genes in GRNs leads to a large possible topology candidate space, which often cannot be exhaustively searched due to the limitation in computational resources. This paper develops a scalable and efficient topology inference for GRNs using Bayesian optimization and kernel-based methods. Rather than an exhaustive search over possible topologies, the proposed method constructs a Gaussian Process (GP) with a topology-inspired kernel function to account for correlation in the likelihood function. Then, using the posterior distribution of the GP model, the Bayesian optimization efficiently searches for the topology with the highest likelihood value by optimally balancing between exploration and exploitation. The performance of the proposed method is demonstrated through comprehensive numerical experiments using a well-known mammalian cell-cycle network.
Collapse
|
3
|
Karbalayghareh A, Qian X, Dougherty ER. Optimal Bayesian Transfer Learning for Count Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:644-655. [PMID: 31180899 DOI: 10.1109/tcbb.2019.2920981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
There is often a limited amount of omics data to design predictive models in biomedicine. Knowing that these omics data come from underlying processes that may share common pathways and disease mechanisms, it may be beneficial for designing a more accurate and reliable predictor in a target domain of interest, where there is a lack of labeled data to leverage available data in relevant source domains. Here, we focus on developing Bayesian transfer learning methods for analyzing next-generation sequencing (NGS) data to help improve predictions in the target domain. We formulate transfer learning in a fully Bayesian framework and define the relatedness by a joint prior distribution of the model parameters of the source and target domains. Defining joint priors acts as a bridge across domains, through which the related knowledge of source data is transferred to the target domain. We focus on RNA-seq discrete count data, which are often overdispersed. To appropriately model them, we consider the Negative Binomial model and propose an Optimal Bayesian Transfer Learning (OBTL) classifier that minimizes the expected classification error in the target domain. We evaluate the performance of the OBTL classifier via both synthetic and cancer data from The Cancer Genome Atlas (TCGA).
Collapse
|
4
|
Garte S, Albert A. Genotype Components as Predictors of Phenotype in Model Gene Regulatory Networks. Acta Biotheor 2019; 67:299-320. [PMID: 31286303 DOI: 10.1007/s10441-019-09350-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 07/04/2019] [Indexed: 10/26/2022]
Abstract
Models of gene regulatory networks (GRN) have proven useful for understanding many aspects of the highly complex behavior of biological control networks. Randomly generated non-Boolean networks were used in experimental simulations to generate data on dynamic phenotypes as a function of several genotypic parameters. We found that predictive relationships between some phenotypes and quantitative genotypic parameters such as number of network genes, interaction density, and initial condition could be derived depending on the strength of the topological (positional) genotype on specific phenotypes. We quantitated the strength of the topological genotype effect (TGE) on a number of phenotypes in multi-gene networks. For phenotypes with a low influence of topological genotype, derived and empirical relationships using quantitative genotype parameters were accurate in phenotypic outcomes. We found a number of dynamic network properties, including oscillation behaviors, that were largely dependent on genotype topology, and for which no such general quantitative relationships were determinable. It remains to be determined if these results are applicable to biological gene regulatory networks.
Collapse
|
5
|
Abstract
Background Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data. Results We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches. Conclusion Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors. Electronic supplementary material The online version of this article (10.1186/s12859-019-2832-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shahin Boluki
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA
| | - Siamak Zamani Dadaneh
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA.,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, College Station, 77843, TX, USA
| | - Edward R Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA. .,TEES-AgriLife Center for Bioinformatics & Genomic Systems Engineering, College Station, 77843, TX, USA.
| |
Collapse
|
6
|
Hajiramezanali E, Imani M, Braga-Neto U, Qian X, Dougherty ER. Scalable optimal Bayesian classification of single-cell trajectories under regulatory model uncertainty. BMC Genomics 2019; 20:435. [PMID: 31189480 PMCID: PMC6561847 DOI: 10.1186/s12864-019-5720-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Single-cell gene expression measurements offer opportunities in deriving mechanistic understanding of complex diseases, including cancer. However, due to the complex regulatory machinery of the cell, gene regulatory network (GRN) model inference based on such data still manifests significant uncertainty. Results The goal of this paper is to develop optimal classification of single-cell trajectories accounting for potential model uncertainty. Partially-observed Boolean dynamical systems (POBDS) are used for modeling gene regulatory networks observed through noisy gene-expression data. We derive the exact optimal Bayesian classifier (OBC) for binary classification of single-cell trajectories. The application of the OBC becomes impractical for large GRNs, due to computational and memory requirements. To address this, we introduce a particle-based single-cell classification method that is highly scalable for large GRNs with much lower complexity than the optimal solution. Conclusion The performance of the proposed particle-based method is demonstrated through numerical experiments using a POBDS model of the well-known T-cell large granular lymphocyte (T-LGL) leukemia network with noisy time-series gene-expression data. Electronic supplementary material The online version of this article (10.1186/s12864-019-5720-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ehsan Hajiramezanali
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA
| | - Mahdi Imani
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA
| | - Ulisses Braga-Neto
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA.
| | - Edward R Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, MS3128 TAMU, College Station, 77843, TX, USA
| |
Collapse
|
7
|
Dougherty ER. A Nonmathematical Review of Optimal Operator and Experimental Design for Uncertain Scientific Models with Application to Genomics. Curr Genomics 2019; 20:16-23. [PMID: 31015788 PMCID: PMC6446484 DOI: 10.2174/1389202919666181213095743] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 12/05/2018] [Accepted: 12/10/2018] [Indexed: 11/22/2022] Open
Abstract
Introduction: The most basic aspect of modern engineering is the design of operators to act on physical systems in an optimal manner relative to a desired objective – for instance, designing a con-trol policy to autonomously direct a system or designing a classifier to make decisions regarding the sys-tem. These kinds of problems appear in biomedical science, where physical models are created with the intention of using them to design tools for diagnosis, prognosis, and therapy. Methods: In the classical paradigm, our knowledge regarding the model is certain; however, in practice, especially with complex systems, our knowledge is uncertain and operators must be designed while tak-ing this uncertainty into account. The related concepts of intrinsically Bayesian robust operators and op-timal Bayesian operators treat operator design under uncertainty. An objective-based experimental de-sign procedure is naturally related to operator design: We would like to perform an experiment that max-imally reduces our uncertainty as it pertains to our objective. Results & Discussion: This paper provides a nonmathematical review of optimal Bayesian operators directed at biomedical scientists. It considers two applications important to genomics, structural interven-tion in gene regulatory networks and classification. Conclusion: The salient point regarding intrinsically Bayesian operators is that uncertainty is quantified relative to the scientific model, and the prior distribution is on the parameters of this model. Optimization has direct physical (biological) meaning. This is opposed to the common method of placing prior distri-butions on the parameters of the operator, in which case there is a scientific gap between operator design and the phenomena.
Collapse
Affiliation(s)
- Edward R Dougherty
- Department of Electrical and Computer Engineering, College Station, Texas A&M University - TX, USA
| |
Collapse
|
8
|
Yoon BJ, Qian X, Kahveci T, Pal R. Selected research articles from the 2017 International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC). BMC Bioinformatics 2018; 19:69. [PMID: 29589557 PMCID: PMC5872518 DOI: 10.1186/s12859-018-2058-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA. .,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering (CBGSE), College Station, TX, 77845, USA.
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering (CBGSE), College Station, TX, 77845, USA
| | - Tamer Kahveci
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, 32611-6125, USA
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409-3102, USA
| |
Collapse
|