1
|
Ren L, Wang J, Li W, Guo M, Yu G. Single-cell RNA-seq data clustering by deep information fusion. Brief Funct Genomics 2024; 23:128-137. [PMID: 37208992 DOI: 10.1093/bfgp/elad017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 02/13/2023] [Indexed: 05/21/2023] Open
Abstract
Determining cell types by single-cell transcriptomics data is fundamental for downstream analysis. However, cell clustering and data imputation still face the computation challenges, due to the high dropout rate, sparsity and dimensionality of single-cell data. Although some deep learning based solutions have been proposed to handle these challenges, they still can not leverage gene attribute information and cell topology in a sensible way to explore the consistent clustering. In this paper, we present scDeepFC, a deep information fusion-based single-cell data clustering method for cell clustering and data imputation. Specifically, scDeepFC uses a deep auto-encoder (DAE) network and a deep graph convolution network to embed high-dimensional gene attribute information and high-order cell-cell topological information into different low-dimensional representations, and then fuses them to generate a more comprehensive and accurate consensus representation via a deep information fusion network. In addition, scDeepFC integrates the zero-inflated negative binomial (ZINB) into DAE to model the dropout events. By jointly optimizing the ZINB loss and cell graph reconstruction loss, scDeepFC generates a salient embedding representation for clustering cells and imputing missing data. Extensive experiments on real single-cell datasets prove that scDeepFC outperforms other popular single-cell analysis methods. Both the gene attribute and cell topology information can improve the cell clustering.
Collapse
Affiliation(s)
- Liangrui Ren
- School of Software, Shandong University, 250101 Ji'nan, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, 250101 Ji'nan, China
| | - Wei Li
- School of Control Science and Engineering, Shandong University, 250061 Ji'nan, China
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, 100044,Bei'jing, China
| | - Guoxian Yu
- School of Software, Shandong University, 250101 Ji'nan, China
| |
Collapse
|
2
|
Wang YM, Sun Y, Wang B, Wu Z, He XY, Zhao Y. Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids. Brief Bioinform 2023; 25:bbad426. [PMID: 37991248 PMCID: PMC10664408 DOI: 10.1093/bib/bbad426] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/12/2023] [Accepted: 10/30/2023] [Indexed: 11/23/2023] Open
Abstract
Due to the high dimensionality and sparsity of the gene expression matrix in single-cell RNA-sequencing (scRNA-seq) data, coupled with significant noise generated by shallow sequencing, it poses a great challenge for cell clustering methods. While numerous computational methods have been proposed, the majority of existing approaches center on processing the target dataset itself. This approach disregards the wealth of knowledge present within other species and batches of scRNA-seq data. In light of this, our paper proposes a novel method named graph-based deep embedding clustering (GDEC) that leverages transfer learning across species and batches. GDEC integrates graph convolutional networks, effectively overcoming the challenges posed by sparse gene expression matrices. Additionally, the incorporation of DEC in GDEC enables the partitioning of cell clusters within a lower-dimensional space, thereby mitigating the adverse effects of noise on clustering outcomes. GDEC constructs a model based on existing scRNA-seq datasets and then applying transfer learning techniques to fine-tune the model using a limited amount of prior knowledge gleaned from the target dataset. This empowers GDEC to adeptly cluster scRNA-seq data cross different species and batches. Through cross-species and cross-batch clustering experiments, we conducted a comparative analysis between GDEC and conventional packages. Furthermore, we implemented GDEC on the scRNA-seq data of uterine fibroids. Compared results obtained from the Seurat package, GDEC unveiled a novel cell type (epithelial cells) and identified a notable number of new pathways among various cell types, thus underscoring the enhanced analytical capabilities of GDEC. Availability and implementation: https://github.com/YuzhiSun/GDEC/tree/main.
Collapse
Affiliation(s)
- Yu Mei Wang
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai , China
- Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai,China
| | - Yuzhi Sun
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Beiying Wang
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai , China
- Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai,China
| | - Zhiping Wu
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai , China
- Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai,China
| | - Xiao Ying He
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai , China
- Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai,China
| | - Yuansong Zhao
- University of Texas Health Science Center at Houston, 77030-5400, USA
| |
Collapse
|
3
|
Wang LP, Liu JX, Shang JL, Kong XZ, Guan BX, Wang J. KGLRR: A low-rank representation K-means with graph regularization constraint method for Single-cell type identification. Comput Biol Chem 2023; 104:107862. [PMID: 37031647 DOI: 10.1016/j.compbiolchem.2023.107862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/26/2023] [Accepted: 03/30/2023] [Indexed: 04/05/2023]
Abstract
Single-cell RNA sequencing technology provides a tremendous opportunity for studying disease mechanisms at the single-cell level. Cell type identification is a key step in the research of disease mechanisms. Many clustering algorithms have been proposed to identify cell types. Most clustering algorithms perform similarity calculation before cell clustering. Because clustering and similarity calculation are independent, a low-rank matrix obtained only by similarity calculation may be unable to fully reveal the patterns in single-cell data. In this study, to capture accurate single-cell clustering information, we propose a novel method based on a low-rank representation model, called KGLRR, that combines the low-rank representation approach with K-means clustering. The cluster centroid is updated as the cell dimension decreases to better from new clusters and improve the quality of clustering information. In addition, the low-rank representation model ignores local geometric information, so the graph regularization constraint is introduced. KGLRR is tested on both simulated and real single-cell datasets to validate the effectiveness of the new method. The experimental results show that KGLRR is more robust and accurate in cell type identification than other advanced algorithms.
Collapse
Affiliation(s)
- Lin-Ping Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jun-Liang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Xiang-Zhen Kong
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Bo-Xin Guan
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Juan Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China.
| |
Collapse
|
4
|
Ding Q, Yang W, Luo M, Xu C, Xu Z, Pang F, Cai Y, Anashkina AA, Su X, Chen N, Jiang Q. CBLRR: a cauchy-based bounded constraint low-rank representation method to cluster single-cell RNA-seq data. Brief Bioinform 2022; 23:6649282. [DOI: 10.1093/bib/bbac300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/17/2022] [Accepted: 07/02/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
The rapid development of single-cel+l RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for exploring biological phenomena at the single-cell level. The discovery of cell types is one of the major applications for researchers to explore the heterogeneity of cells. Some computational methods have been proposed to solve the problem of scRNA-seq data clustering. However, the unavoidable technical noise and notorious dropouts also reduce the accuracy of clustering methods. Here, we propose the cauchy-based bounded constraint low-rank representation (CBLRR), which is a low-rank representation-based method by introducing cauchy loss function (CLF) and bounded nuclear norm regulation, aiming to alleviate the above issue. Specifically, as an effective loss function, the CLF is proven to enhance the robustness of the identification of cell types. Then, we adopt the bounded constraint to ensure the entry values of single-cell data within the restricted interval. Finally, the performance of CBLRR is evaluated on 15 scRNA-seq datasets, and compared with other state-of-the-art methods. The experimental results demonstrate that CBLRR performs accurately and robustly on clustering scRNA-seq data. Furthermore, CBLRR is an effective tool to cluster cells, and provides great potential for downstream analysis of single-cell data. The source code of CBLRR is available online at https://github.com/Ginnay/CBLRR.
Collapse
Affiliation(s)
- Qian Ding
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Wenyi Yang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Chang Xu
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Zhaochun Xu
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Fenglan Pang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Yideng Cai
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| | - Anastasia A Anashkina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences , Moscow, Russia
| | - Xi Su
- Foshan Maternity & Child Healthcare Hospital, Southern Medical University , Foshan, Guangdong, China
| | - Na Chen
- Department of Hematology, Shandong Provincial Hospital Affiliated to Shandong First Medical University , Jinan, Shandong, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology , Harbin, Heilongjiang, China
| |
Collapse
|
5
|
Intelligent Algorithm-Based Ultrasound Image for Evaluating the Effect of Comprehensive Nursing Scheme on Patients with Diabetic Kidney Disease. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:6440138. [PMID: 35309831 PMCID: PMC8930247 DOI: 10.1155/2022/6440138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 02/20/2022] [Accepted: 02/22/2022] [Indexed: 11/17/2022]
Abstract
This study was aimed at exploring the effect of ultrasound image evaluation of comprehensive nursing scheme based on artificial intelligence algorithms on patients with diabetic kidney disease (DKD). 44 patients diagnosed with DKD were randomly divided into two groups: group A (no nursing intervention) and group B (comprehensive nursing). In the same period, 32 healthy volunteers were selected as the control group. Ultrasonographic images based on the
non-local-means (KNL-Means) filtering algorithm were used to perform imaging examinations in healthy people and DKD patients before and after care. The results suggested that compared with those of the SAE reconstruction algorithm and KAVD reconstruction algorithm, the PSNR value of artificial bee colony algorithm reconstruction of image was higher and the MSE value was lower. The resistant index (RI) of DKD patients in group B after nursing was
, apparently distinct from the RI of the healthy people (controls) in the same group (
) and the RI of DKD patients in group A (
) (
). The incidence rate of complications in DKD patients in group B was apparently inferior to that in group A. After comprehensive nursing intervention (CNI), the scores of all dimensions of quality of life (QoL) in DKD patients in group B were obviously superior versus those in DKD patients in group A. It suggests that implementation of nursing intervention for DKD patients can effectively help patients improve and control the level of renal function, while ultrasound images based on intelligent algorithm can dynamically detect the changes in the level of renal function in patients, which has the value of clinical promotion.
Collapse
|