1
|
Wei X, Wu J, Li G, Liu J, Wu X, He C. scPEDSSC: proximity enhanced deep sparse subspace clustering method for scRNA-seq data. PLoS Comput Biol 2025; 21:e1012924. [PMID: 40294099 PMCID: PMC12036905 DOI: 10.1371/journal.pcbi.1012924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 03/03/2025] [Indexed: 04/30/2025] Open
Abstract
It is a significant step for single cell analysis to identify cell types through clustering single-cell RNA sequencing (scRNA-seq) data. However, great challenges still remain due to the inherent high-dimensionality, noise, and sparsity of scRNA-seq data. In this study, scPEDSSC, a deep sparse subspace clustering method based on proximity enhancement, is put forward. The self-expression matrix (SEM), learned from the deep auto-encoder with two part generalized gamma (TPGG) distribution, are adopted to generate the similarity matrix along with its second power. Compared with eight state-of-the-art single-cell clustering methods on twelve real biological datasets, the proposed method scPEDSSC can achieve superior performance in most datasets, which has been verified through a number of experiments.
Collapse
Affiliation(s)
- Xiaopeng Wei
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, Guangxi, China
| | - Jingli Wu
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, Guangxi, China
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, Guangxi, China
| | - Gaoshi Li
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, Guangxi, China
| | - Jiafei Liu
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, Guangxi, China
| | - Xi Wu
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, Guangxi, China
| | - Chang He
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi, China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, Guangxi, China
| |
Collapse
|
2
|
Gong L, Cui X, Liu Y, Lin C, Gao Z. SinCWIm: An imputation method for single-cell RNA sequence dropouts using weighted alternating least squares. Comput Biol Med 2024; 171:108225. [PMID: 38442556 DOI: 10.1016/j.compbiomed.2024.108225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 01/28/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
BACKGROUND AND OBJECTIVES Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for exploring cellular heterogeneity, discovering novel or rare cell types, distinguishing between tissue-specific cellular composition, and understanding cell differentiation during development. However, due to technological limitations, dropout events in scRNA-seq can mistakenly convert some entries in the real data to zero. This is equivalent to introducing noise into the data of cell gene expression entries. The data is contaminated, which affects the performance of downstream analyses, including clustering, cell annotation, differential gene expression analysis, and so on. Therefore, it is a crucial work to accurately determine which zeros are due to dropout events and perform imputation operations on them. METHODS Considering the different confidence levels of different zeros in the gene expression matrix, this paper proposes a SinCWIm method for dropout events in scRNA-seq based on weighted alternating least squares (WALS). The method utilizes Pearson correlation coefficient and hierarchical clustering to quantify the confidence of zero entries. It is then combined with WALS for matrix decomposition. And the imputation result is made close to the actual number by outlier removal and data correction operations. RESULTS A total of eight single-cell sequencing datasets were used for comparative experiments to demonstrate the overall superiority of SinCWIm over state-of-the-art models. SinCWIm was applied to cluster the data to obtain an adjusted RAND index evaluation, and the Usoskin, Pollen and Bladder datasets scored 94.46%, 96.48% and 76.74%, respectively. In addition, significant improvements were made in the retention of differential expression genes and visualization. CONCLUSIONS SinCWIm provides a valuable imputation method for handling dropout events in single-cell sequencing data. In comparison to advanced methods, SinCWIm demonstrates excellent performance in clustering, visualization and other aspects. It is applicable to various single-cell sequencing datasets.
Collapse
Affiliation(s)
- Lejun Gong
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China.
| | - Xiong Cui
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Yang Liu
- Jiangsu Key Lab of Big Data Security & Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
| | - Cai Lin
- Department of Burn, Wound Repair and Regenerative Medicine Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, 325000, China.
| | - Zhihong Gao
- Zhejiang Engineering Research Center of Intelligent Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
3
|
Fang Z, Zheng R, Li M. scMAE: a masked autoencoder for single-cell RNA-seq clustering. Bioinformatics 2024; 40:btae020. [PMID: 38230824 PMCID: PMC10832357 DOI: 10.1093/bioinformatics/btae020] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/07/2024] [Accepted: 01/12/2024] [Indexed: 01/18/2024] Open
Abstract
MOTIVATION Single-cell RNA sequencing has emerged as a powerful technology for studying gene expression at the individual cell level. Clustering individual cells into distinct subpopulations is fundamental in scRNA-seq data analysis, facilitating the identification of cell types and exploration of cellular heterogeneity. Despite the recent development of many deep learning-based single-cell clustering methods, few have effectively exploited the correlations among genes, resulting in suboptimal clustering outcomes. RESULTS Here, we propose a novel masked autoencoder-based method, scMAE, for cell clustering. scMAE perturbs gene expression and employs a masked autoencoder to reconstruct the original data, learning robust and informative cell representations. The masked autoencoder introduces a masking predictor, which captures relationships among genes by predicting whether gene expression values are masked. By integrating this masking mechanism, scMAE effectively captures latent structures and dependencies in the data, enhancing clustering performance. We conducted extensive comparative experiments using various clustering evaluation metrics on 15 scRNA-seq datasets from different sequencing platforms. Experimental results indicate that scMAE outperforms other state-of-the-art methods on these datasets. In addition, scMAE accurately identifies rare cell types, which are challenging to detect due to their low abundance. Furthermore, biological analyses confirm the biological significance of the identified cell subpopulations. AVAILABILITY AND IMPLEMENTATION The source code of scMAE is available at: https://zenodo.org/records/10465991.
Collapse
Affiliation(s)
- Zhaoyu Fang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District, Changsha 410083, China
| |
Collapse
|
4
|
Fang Z, Liu T, Zheng R, A J, Yin M, Li M. stAA: adversarial graph autoencoder for spatial clustering task of spatially resolved transcriptomics. Brief Bioinform 2023; 25:bbad500. [PMID: 38189544 PMCID: PMC10772985 DOI: 10.1093/bib/bbad500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/22/2023] [Accepted: 12/11/2023] [Indexed: 01/09/2024] Open
Abstract
With the development of spatially resolved transcriptomics technologies, it is now possible to explore the gene expression profiles of single cells while preserving their spatial context. Spatial clustering plays a key role in spatial transcriptome data analysis. In the past 2 years, several graph neural network-based methods have emerged, which significantly improved the accuracy of spatial clustering. However, accurately identifying the boundaries of spatial domains remains a challenging task. In this article, we propose stAA, an adversarial variational graph autoencoder, to identify spatial domain. stAA generates cell embedding by leveraging gene expression and spatial information using graph neural networks and enforces the distribution of cell embeddings to a prior distribution through Wasserstein distance. The adversarial training process can make cell embeddings better capture spatial domain information and more robust. Moreover, stAA incorporates global graph information into cell embeddings using labels generated by pre-clustering. Our experimental results show that stAA outperforms the state-of-the-art methods and achieves better clustering results across different profiling platforms and various resolutions. We also conducted numerous biological analyses and found that stAA can identify fine-grained structures in tissues, recognize different functional subtypes within tumors and accurately identify developmental trajectories.
Collapse
Affiliation(s)
- Zhaoyu Fang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Teng Liu
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing 404031, China
- Translational Medicine Research Center (TMRC), School of Medicine, Chongqing University, Chongqing 401331, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Jin A
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Mingzhu Yin
- Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC), Chongqing University Three Gorges Hospital, Chongqing University, Chongqing 404031, China
- Translational Medicine Research Center (TMRC), School of Medicine, Chongqing University, Chongqing 401331, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
5
|
Qi Y, Han S, Tang L, Liu L. Imputation method for single-cell RNA-seq data using neural topic model. Gigascience 2022; 12:giad098. [PMID: 38000911 PMCID: PMC10673642 DOI: 10.1093/gigascience/giad098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 09/02/2023] [Accepted: 10/23/2023] [Indexed: 11/26/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology studies transcriptome and cell-to-cell differences from higher single-cell resolution and different perspectives. Despite the advantage of high capture efficiency, downstream functional analysis of scRNA-seq data is made difficult by the excess of zero values (i.e., the dropout phenomenon). To effectively address this problem, we introduced scNTImpute, an imputation framework based on a neural topic model. A neural network encoder is used to extract underlying topic features of single-cell transcriptome data to infer high-quality cell similarity. At the same time, we determine which transcriptome data are affected by the dropout phenomenon according to the learning of the mixture model by the neural network. On the basis of stable cell similarity, the same gene information in other similar cells is borrowed to impute only the missing expression values. By evaluating the performance of real data, scNTImpute can accurately and efficiently identify the dropout values and imputes them accurately. In the meantime, the clustering of cell subsets is improved and the original biological information in cell clustering is solved, which is covered by technical noise. The source code for the scNTImpute module is available as open source at https://github.com/qiyueyang-7/scNTImpute.git.
Collapse
Affiliation(s)
- Yueyang Qi
- Yunnan Normal University, School of Information, Kunming 650500, China
| | - Shuangkai Han
- Yunnan Normal University, School of Information, Kunming 650500, China
| | - Lin Tang
- Yunnan Normal University, Faculty of Education, Kunming 650500, China
| | - Lin Liu
- Yunnan Normal University, School of Information, Kunming 650500, China
| |
Collapse
|