Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chen L, Wang W, Zhai Y, Deng M. Deep soft K-means clustering with self-training for single-cell RNA sequence data. NAR Genom Bioinform 2020;2:lqaa039. [PMID: 33575592 PMCID: PMC7671315 DOI: 10.1093/nargab/lqaa039] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 04/24/2020] [Accepted: 05/14/2020] [Indexed: 12/18/2022] Open

For:	Chen L, Wang W, Zhai Y, Deng M. Deep soft K-means clustering with self-training for single-cell RNA sequence data. NAR Genom Bioinform 2020;2:lqaa039. [PMID: 33575592 PMCID: PMC7671315 DOI: 10.1093/nargab/lqaa039] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 04/24/2020] [Accepted: 05/14/2020] [Indexed: 12/18/2022] Open

Number

Cited by Other Article(s)

Arya A, Tripathi P, Dubey N, Aier I, Kumar Varadwaj P. Navigating single-cell RNA-sequencing: protocols, tools, databases, and applications. Genomics Inform 2025;23:13. [PMID: 40382658 DOI: 10.1186/s44342-025-00044-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Accepted: 04/07/2025] [Indexed: 05/20/2025] Open

Li T, Wang Z, Liu Y, He S, Zou Q, Zhang Y. An overview of computational methods in single-cell transcriptomic cell type annotation. Brief Bioinform 2025;26:bbaf207. [PMID: 40347979 PMCID: PMC12065632 DOI: 10.1093/bib/bbaf207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 03/14/2025] [Accepted: 04/01/2025] [Indexed: 05/14/2025] Open

Wu W, Wang S, Zhang K, Li H, Qiao S, Zhang Y, Pang S. scMDCL: A Deep Collaborative Contrastive Learning Framework for Matched Single-Cell Multiomics Data Clustering. J Chem Inf Model 2025;65:3048-3063. [PMID: 40068854 DOI: 10.1021/acs.jcim.4c02114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]

Wang L, Zhang H, Yi B, Xie W, Yu K, Li W, Li K, Zhao D. FactVAE: a factorized variational autoencoder for single-cell multi-omics data integration analysis. Brief Bioinform 2025;26:bbaf157. [PMID: 40211981 PMCID: PMC11986350 DOI: 10.1093/bib/bbaf157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 03/02/2025] [Accepted: 03/21/2025] [Indexed: 04/14/2025] Open

Liang DM, Du PF. scMUG: deep clustering analysis of single-cell RNA-seq data on multiple gene functional modules. Brief Bioinform 2025;26:bbaf138. [PMID: 40188497 PMCID: PMC11972635 DOI: 10.1093/bib/bbaf138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 02/11/2025] [Accepted: 03/09/2025] [Indexed: 04/08/2025] Open

Li B, Zhao Y, Hu J, Zhang S, Zhang X. scSAMAC: saliency-adjusted masking induced attention contrastive learning for single-cell clustering. Brief Bioinform 2025;26:bbaf128. [PMID: 40131310 PMCID: PMC11934584 DOI: 10.1093/bib/bbaf128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 02/23/2025] [Accepted: 03/01/2025] [Indexed: 03/26/2025] Open

Tang B, Chen Y. scFTAT: a novel cell annotation method integrating FFT and transformer. BMC Bioinformatics 2025;26:62. [PMID: 39994539 PMCID: PMC11853718 DOI: 10.1186/s12859-025-06061-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Accepted: 01/22/2025] [Indexed: 02/26/2025] Open

Abstract

BACKGROUND

Advancements in high-throughput sequencing and deep learning have boosted single-cell RNA studies. However, current methods for annotating single-cell data face challenges due to high data sparsity and tedious manual annotation on large-scale data.

RESULTS

Thus, we proposed a novel annotation model integrating FFT (Fast Fourier Transform) and an enhanced Transformer, named scFTAT. Initially, it reduces data sparsity using LDA (Linear Discriminant Analysis). Subsequently, automatic cell annotation is achieved through a proposed module integrating FFT and an enhanced Transformer. Moreover, the model is fine-tuned to improve training performance by effectively incorporating such techniques as kernel approximation, position encoding enhancement, and attention enhancement modules. Compared to existing popular annotation tools, scFTAT maintains high accuracy and robustness on six typical datasets. Specifically, the model achieves an accuracy of 0.93 on the human kidney data, with an F1 score of 0.84, precision of 0.96, recall rate of 0.80, and Matthews correlation coefficient of 0.89. The highest accuracy of the compared methods is 0.92, with an F1 score of 0.71, precision of 0.75, recall rate of 0.73, and Matthews correlation coefficient of 0.85. The compiled codes and supplements are available at: https://github.com/gladex/scFTAT .

CONCLUSION

In summary, the proposed scFTAT effectively integrates FFT and enhanced Transformer for automatic feature learning, addressing the challenges of high sparsity and tedious manual annotation in single-cell profiling data. Experiments on six typical scRNA-seq datasets from human and mouse tissues evaluate the model using five metrics as accuracy, F1 score, precision, recall, and Matthews correlation coefficient. Performance comparisons with existing methods further demonstrate the efficiency and robustness of our proposed method.

Collapse

Zhang Y, Feng X, Wang Y, Shi K. Deep learning powered single-cell clustering framework with enhanced accuracy and stability. Sci Rep 2025;15:4107. [PMID: 39900656 PMCID: PMC11791198 DOI: 10.1038/s41598-025-87672-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 01/21/2025] [Indexed: 02/05/2025] Open

Abstract

Single-cell RNA sequencing (scRNA-seq) has revolutionized the field of cellular diversity research. Unsupervised clustering, a key technique in this exploration, allows for the identification of distinct cell types within a population. Graph-based deep clustering methods have shown promise in preserving the structural relationships between cells (nodes) within the data. However, these methods often neglect the inherent distribution of nodes in the graph, leading to incomplete representations of cell populations. Additionally, conventional graph convolutional networks (GCNs) can suffer from oversmoothing, a phenomenon where the network loses the ability to differentiate between samples with similar expression profiles. To address these limitations, we proposed scG-cluster, an innovative deep structural clustering method. This method incorporates two key innovations: (1) Dual-topology adjacency graph: scG-cluster integrates information about node distribution into the traditional adjacency graph used by GCNs. This enriches the graph representation by capturing the spatial relationships between cells in addition to their pairwise similarities. (2) Dual-topology adaptive graph convolutional network (TAGCN): The framework employs a TAGCN architecture with residual concatenation. This network utilizes an attention mechanism to dynamically weight features within the graph, focusing on the most informative aspects for clustering. Additionally, residual connections are implemented to combat oversmoothing, ensuring the network retains the ability to distinguish between subtle differences in cell expression profiles. Furthermore, scG-cluster iteratively refines the clustering centers, leading to enhanced stability and accuracy in the final cluster assignments. Extensive evaluations on six diverse scRNA-seq datasets demonstrate that scG-cluster consistently outperforms existing state-of-the-art methods in terms of both clustering accuracy and scalability. Ablation studies are also conducted to validate the significant contributions of both the residual connections and the attention mechanism to the overall performance of the model. The source code for scG-cluster is publicly available at https://github.com/xixi-wq/scG-cluster .

Collapse

Hozumi Y, Wei GW. Analyzing scRNA-seq data by CCP-assisted UMAP and tSNE. PLoS One 2024;19:e0311791. [PMID: 39671349 PMCID: PMC11642954 DOI: 10.1371/journal.pone.0311791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 09/24/2024] [Indexed: 12/15/2024] Open

Tian S, Ji C, Ni J, Wang Y, Zheng C. Using Multi-Encoder Semi-Implicit Graph Variational Autoencoder to Analyze Single-Cell RNA Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:2280-2291. [PMID: 39255084 DOI: 10.1109/tcbb.2024.3458170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]

Shu Z, Xia M, Tan K, Zhang Y, Yu Z. Multi-level multi-view network based on structural contrastive learning for scRNA-seq data clustering. Brief Bioinform 2024;25:bbae562. [PMID: 39494609 PMCID: PMC11532661 DOI: 10.1093/bib/bbae562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 09/23/2024] [Accepted: 10/18/2024] [Indexed: 11/05/2024] Open

Liu T, Jia C, Bi Y, Guo X, Zou Q, Li F. scDFN: enhancing single-cell RNA-seq clustering with deep fusion networks. Brief Bioinform 2024;25:bbae486. [PMID: 39373051 PMCID: PMC11456827 DOI: 10.1093/bib/bbae486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/07/2024] [Accepted: 09/17/2024] [Indexed: 10/08/2024] Open

Abstract

Single-cell ribonucleic acid sequencing (scRNA-seq) technology can be used to perform high-resolution analysis of the transcriptomes of individual cells. Therefore, its application has gained popularity for accurately analyzing the ever-increasing content of heterogeneous single-cell datasets. Central to interpreting scRNA-seq data is the clustering of cells to decipher transcriptomic diversity and infer cell behavior patterns. However, its complexity necessitates the application of advanced methodologies capable of resolving the inherent heterogeneity and limited gene expression characteristics of single-cell data. Herein, we introduce a novel deep learning-based algorithm for single-cell clustering, designated scDFN, which can significantly enhance the clustering of scRNA-seq data through a fusion network strategy. The scDFN algorithm applies a dual mechanism involving an autoencoder to extract attribute information and an improved graph autoencoder to capture topological nuances, integrated via a cross-network information fusion mechanism complemented by a triple self-supervision strategy. This fusion is optimized through a holistic consideration of four distinct loss functions. A comparative analysis with five leading scRNA-seq clustering methodologies across multiple datasets revealed the superiority of scDFN, as determined by better the Normalized Mutual Information (NMI) and the Adjusted Rand Index (ARI) metrics. Additionally, scDFN demonstrated robust multi-cluster dataset performance and exceptional resilience to batch effects. Ablation studies highlighted the key roles of the autoencoder and the improved graph autoencoder components, along with the critical contribution of the four joint loss functions to the overall efficacy of the algorithm. Through these advancements, scDFN set a new benchmark in single-cell clustering and can be used as an effective tool for the nuanced analysis of single-cell transcriptomics.

Collapse

Yao Z, Li B, Lu Y, Yau ST. Single-cell analysis via manifold fitting: A framework for RNA clustering and beyond. Proc Natl Acad Sci U S A 2024;121:e2400002121. [PMID: 39226348 PMCID: PMC11406302 DOI: 10.1073/pnas.2400002121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 07/19/2024] [Indexed: 09/05/2024] Open

Gao H, Shen W, Li R, Liu C, Wu S. Collaborative Structure-Preserved Missing Data Imputation for Single-Cell RNA-Seq Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:1480-1491. [PMID: 38776196 DOI: 10.1109/tcbb.2024.3404013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]

Qin L, Zhang G, Zhang S, Chen Y. Deep Batch Integration and Denoise of Single-Cell RNA-Seq Data. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024;11:e2308934. [PMID: 38778573 PMCID: PMC11304254 DOI: 10.1002/advs.202308934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/14/2024] [Indexed: 05/25/2024]

Alsaggaf I, Buchan D, Wan C. Improving cell type identification with Gaussian noise-augmented single-cell RNA-seq contrastive learning. Brief Funct Genomics 2024;23:441-451. [PMID: 38242863 DOI: 10.1093/bfgp/elad059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 12/14/2023] [Accepted: 12/18/2023] [Indexed: 01/21/2024] Open

Xie J, Ruan S, Tu M, Yuan Z, Hu J, Li H, Li S. Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding. Oncogene 2024;43:2279-2292. [PMID: 38834657 DOI: 10.1038/s41388-024-03074-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 05/22/2024] [Accepted: 05/28/2024] [Indexed: 06/06/2024]

Xiong J, Gong F, Ma L, Wan L. scVIC: deep generative modeling of heterogeneity for scRNA-seq data. BIOINFORMATICS ADVANCES 2024;4:vbae086. [PMID: 39027640 PMCID: PMC11256938 DOI: 10.1093/bioadv/vbae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/15/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024]

Zhang T, Ren J, Li L, Wu Z, Zhang Z, Dong G, Wang G. scZAG: Integrating ZINB-Based Autoencoder with Adaptive Data Augmentation Graph Contrastive Learning for scRNA-seq Clustering. Int J Mol Sci 2024;25:5976. [PMID: 38892162 PMCID: PMC11172799 DOI: 10.3390/ijms25115976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 04/08/2024] [Accepted: 05/28/2024] [Indexed: 06/21/2024] Open

Abstract

Single-cell RNA sequencing (scRNA-seq) is widely used to interpret cellular states, detect cell subpopulations, and study disease mechanisms. In scRNA-seq data analysis, cell clustering is a key step that can identify cell types. However, scRNA-seq data are characterized by high dimensionality and significant sparsity, presenting considerable challenges for clustering. In the high-dimensional gene expression space, cells may form complex topological structures. Many conventional scRNA-seq data analysis methods focus on identifying cell subgroups rather than exploring these potential high-dimensional structures in detail. Although some methods have begun to consider the topological structures within the data, many still overlook the continuity and complex topology present in single-cell data. We propose a deep learning framework that begins by employing a zero-inflated negative binomial (ZINB) model to denoise the highly sparse and over-dispersed scRNA-seq data. Next, scZAG uses an adaptive graph contrastive representation learning approach that combines approximate personalized propagation of neural predictions graph convolution (APPNPGCN) with graph contrastive learning methods. By using APPNPGCN as the encoder for graph contrastive learning, we ensure that each cell's representation reflects not only its own features but also its position in the graph and its relationships with other cells. Graph contrastive learning exploits the relationships between nodes to capture the similarity among cells, better representing the data's underlying continuity and complex topology. Finally, the learned low-dimensional latent representations are clustered using Kullback-Leibler divergence. We validated the superior clustering performance of scZAG on 10 common scRNA-seq datasets in comparison to existing state-of-the-art clustering methods.

Collapse

Qiu Y, Yang L, Jiang H, Zou Q. scTPC: a novel semisupervised deep clustering model for scRNA-seq data. Bioinformatics 2024;40:btae293. [PMID: 38684178 PMCID: PMC11091743 DOI: 10.1093/bioinformatics/btae293] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/14/2024] [Accepted: 04/26/2024] [Indexed: 05/02/2024] Open

Manousidaki A, Little A, Xie Y. Clustering and visualization of single-cell RNA-seq data using path metrics. PLoS Comput Biol 2024;20:e1012014. [PMID: 38809943 PMCID: PMC11164391 DOI: 10.1371/journal.pcbi.1012014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 06/10/2024] [Accepted: 03/21/2024] [Indexed: 05/31/2024] Open

Zhang W, Yu R, Xu Z, Li J, Gao W, Jiang M, Dai Q. scCompressSA: dual-channel self-attention based deep autoencoder model for single-cell clustering by compressing gene-gene interactions. BMC Genomics 2024;25:423. [PMID: 38684946 PMCID: PMC11059774 DOI: 10.1186/s12864-024-10286-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/04/2024] [Indexed: 05/02/2024] Open

Lee J, Yun S, Kim Y, Chen T, Kellis M, Park C. Single-cell RNA sequencing data imputation using bi-level feature propagation. Brief Bioinform 2024;25:bbae209. [PMID: 38706317 PMCID: PMC11070731 DOI: 10.1093/bib/bbae209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 04/08/2024] [Accepted: 04/19/2024] [Indexed: 05/07/2024] Open

Zhai Y, Chen L, Deng M. scBOL: a universal cell type identification framework for single-cell and spatial transcriptomics data. Brief Bioinform 2024;25:bbae188. [PMID: 38678389 PMCID: PMC11056022 DOI: 10.1093/bib/bbae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/11/2024] [Accepted: 04/14/2024] [Indexed: 04/30/2024] Open

Abstract

MOTIVATION

Over the past decade, single-cell transcriptomic technologies have experienced remarkable advancements, enabling the simultaneous profiling of gene expressions across thousands of individual cells. Cell type identification plays an essential role in exploring tissue heterogeneity and characterizing cell state differences. With more and more well-annotated reference data becoming available, massive automatic identification methods have sprung up to simplify the annotation process on unlabeled target data by transferring the cell type knowledge. However, in practice, the target data often include some novel cell types that are not in the reference data. Most existing works usually classify these private cells as one generic 'unassigned' group and learn the features of known and novel cell types in a coupled way. They are susceptible to the potential batch effects and fail to explore the fine-grained semantic knowledge of novel cell types, thus hurting the model's discrimination ability. Additionally, emerging spatial transcriptomic technologies, such as in situ hybridization, sequencing and multiplexed imaging, present a novel challenge to current cell type identification strategies that predominantly neglect spatial organization. Consequently, it is imperative to develop a versatile method that can proficiently annotate single-cell transcriptomics data, encompassing both spatial and non-spatial dimensions.

RESULTS

To address these issues, we propose a new, challenging yet realistic task called universal cell type identification for single-cell and spatial transcriptomics data. In this task, we aim to give semantic labels to target cells from known cell types and cluster labels to those from novel ones. To tackle this problem, instead of designing a suboptimal two-stage approach, we propose an end-to-end algorithm called scBOL from the perspective of Bipartite prototype alignment. Firstly, we identify the mutual nearest clusters in reference and target data as their potential common cell types. On this basis, we mine the cycle-consistent semantic anchor cells to build the intrinsic structure association between two data. Secondly, we design a neighbor-aware prototypical learning paradigm to strengthen the inter-cluster separability and intra-cluster compactness within each data, thereby inspiring the discriminative feature representations. Thirdly, driven by the semantic-aware prototypical learning framework, we can align the known cell types and separate the private cell types from them among reference and target data. Such an algorithm can be seamlessly applied to various data types modeled by different foundation models that can generate the embedding features for cells. Specifically, for non-spatial single-cell transcriptomics data, we use the autoencoder neural network to learn latent low-dimensional cell representations, and for spatial single-cell transcriptomics data, we apply the graph convolution network to capture molecular and spatial similarities of cells jointly. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scBOL over various state-of-the-art cell type identification methods. To our knowledge, we are the pioneers in presenting this pragmatic annotation task, as well as in devising a comprehensive algorithmic framework aimed at resolving this challenge across varied types of single-cell data. Finally, scBOL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scBOL.

Collapse

Ren L, Wang J, Li W, Guo M, Yu G. Single-cell RNA-seq data clustering by deep information fusion. Brief Funct Genomics 2024;23:128-137. [PMID: 37208992 DOI: 10.1093/bfgp/elad017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 02/13/2023] [Indexed: 05/21/2023] Open

Hu D, Liang K, Dong Z, Wang J, Zhao Y, He K. Effective multi-modal clustering method via skip aggregation network for parallel scRNA-seq and scATAC-seq data. Brief Bioinform 2024;25:bbae102. [PMID: 38493338 PMCID: PMC10944573 DOI: 10.1093/bib/bbae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/06/2024] [Accepted: 02/16/2024] [Indexed: 03/18/2024] Open

Fang Z, Zheng R, Li M. scMAE: a masked autoencoder for single-cell RNA-seq clustering. Bioinformatics 2024;40:btae020. [PMID: 38230824 PMCID: PMC10832357 DOI: 10.1093/bioinformatics/btae020] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 01/07/2024] [Accepted: 01/12/2024] [Indexed: 01/18/2024] Open

Wang Z, Xie X, Liu S, Ji Z. scFseCluster: a feature selection-enhanced clustering for single-cell RNA-seq data. Life Sci Alliance 2023;6:e202302103. [PMID: 37788907 PMCID: PMC10547911 DOI: 10.26508/lsa.202302103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 09/21/2023] [Accepted: 09/22/2023] [Indexed: 10/05/2023] Open

Tian SW, Ni JC, Wang YT, Zheng CH, Ji CM. scGCC: Graph Contrastive Clustering With Neighborhood Augmentations for scRNA-Seq Data Analysis. IEEE J Biomed Health Inform 2023;27:6133-6143. [PMID: 37751336 DOI: 10.1109/jbhi.2023.3319551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]

Zhan Y, Liu J, Ou-Yang L. scMIC: A Deep Multi-Level Information Fusion Framework for Clustering Single-Cell Multi-Omics Data. IEEE J Biomed Health Inform 2023;27:6121-6132. [PMID: 37725723 DOI: 10.1109/jbhi.2023.3317272] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]

Liu J, Zeng W, Kan S, Li M, Zheng R. CAKE: a flexible self-supervised framework for enhancing cell visualization, clustering and rare cell identification. Brief Bioinform 2023;25:bbad475. [PMID: 38145950 PMCID: PMC10749894 DOI: 10.1093/bib/bbad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/13/2023] [Accepted: 11/30/2023] [Indexed: 12/27/2023] Open

Wang L, Li W, Xie W, Wang R, Yu K. Dual-GCN-based deep clustering with triplet contrast for ScRNA-seq data analysis. Comput Biol Chem 2023;106:107924. [PMID: 37487251 DOI: 10.1016/j.compbiolchem.2023.107924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 06/08/2023] [Accepted: 07/12/2023] [Indexed: 07/26/2023]

Lei T, Chen R, Zhang S, Chen Y. Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations. Brief Bioinform 2023;24:bbad335. [PMID: 37769630 PMCID: PMC10539043 DOI: 10.1093/bib/bbad335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 10/02/2023] Open

Pan W, Long F, Pan J. ScInfoVAE: interpretable dimensional reduction of single cell transcription data with variational autoencoders and extended mutual information regularization. BioData Min 2023;16:17. [PMID: 37301826 DOI: 10.1186/s13040-023-00333-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 06/05/2023] [Indexed: 06/12/2023] Open

Zhang S, Li X, Lin J, Lin Q, Wong KC. Review of single-cell RNA-seq data clustering for cell-type identification and characterization. RNA (NEW YORK, N.Y.) 2023;29:517-530. [PMID: 36737104 PMCID: PMC10158997 DOI: 10.1261/rna.078965.121] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 01/03/2023] [Indexed: 05/06/2023]

Feng X, Zhang H, Lin H, Long H. Single-cell RNA-seq data analysis based on directed graph neural network. Methods 2023;211:48-60. [PMID: 36804214 DOI: 10.1016/j.ymeth.2023.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/09/2022] [Accepted: 02/13/2023] [Indexed: 02/17/2023] Open

Yu X, Xu X, Zhang J, Li X. Batch alignment of single-cell transcriptomics data using deep metric learning. Nat Commun 2023;14:960. [PMID: 36810607 PMCID: PMC9944958 DOI: 10.1038/s41467-023-36635-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 02/10/2023] [Indexed: 02/24/2023] Open

Wang Y, Yu Z, Li S, Bian C, Liang Y, Wong KC, Li X. scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering. Bioinformatics 2023;39:7025496. [PMID: 36734596 PMCID: PMC9925104 DOI: 10.1093/bioinformatics/btad075] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 12/08/2022] [Accepted: 02/02/2023] [Indexed: 02/04/2023] Open

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) is an increasingly popular technique for transcriptomic analysis of gene expression at the single-cell level. Cell-type clustering is the first crucial task in the analysis of scRNA-seq data that facilitates accurate identification of cell types and the study of the characteristics of their transcripts. Recently, several computational models based on a deep autoencoder and the ensemble clustering have been developed to analyze scRNA-seq data. However, current deep autoencoders are not sufficient to learn the latent representations of scRNA-seq data, and obtaining consensus partitions from these feature representations remains under-explored.

RESULTS

To address this challenge, we propose a single-cell deep clustering model via a dual denoising autoencoder with bipartite graph ensemble clustering called scBGEDA, to identify specific cell populations in single-cell transcriptome profiles. First, a single-cell dual denoising autoencoder network is proposed to project the data into a compressed low-dimensional space and that can learn feature representation via explicit modeling of synergistic optimization of the zero-inflated negative binomial reconstruction loss and denoising reconstruction loss. Then, a bipartite graph ensemble clustering algorithm is designed to exploit the relationships between cells and the learned latent embedded space by means of a graph-based consensus function. Multiple comparison experiments were conducted on 20 scRNA-seq datasets from different sequencing platforms using a variety of clustering metrics. The experimental results indicated that scBGEDA outperforms other state-of-the-art methods on these datasets, and also demonstrated its scalability to large-scale scRNA-seq datasets. Moreover, scBGEDA was able to identify cell-type specific marker genes and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into identifying cell types and characterizing the scRNA-seq data from different perspectives.

AVAILABILITY AND IMPLEMENTATION

The source code of scBGEDA is available at https://github.com/wangyh082/scBGEDA. The software and the supporting data can be downloaded from https://figshare.com/articles/software/scBGEDA/19657911.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Yu Z, Su Y, Lu Y, Yang Y, Wang F, Zhang S, Chang Y, Wong KC, Li X. Topological identification and interpretation for single-cell gene regulation elucidation across multiple platforms using scMGCA. Nat Commun 2023;14:400. [PMID: 36697410 PMCID: PMC9877026 DOI: 10.1038/s41467-023-36134-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 01/16/2023] [Indexed: 01/26/2023] Open

Wang J, Xia J, Wang H, Su Y, Zheng CH. scDCCA: deep contrastive clustering for single-cell RNA-seq data based on auto-encoder network. Brief Bioinform 2023;24:6984787. [PMID: 36631401 DOI: 10.1093/bib/bbac625] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/12/2022] [Accepted: 12/19/2022] [Indexed: 01/13/2023] Open

Abstract

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.

Collapse

Lin X, Tian T, Wei Z, Hakonarson H. Clustering of single-cell multi-omics data with a multimodal deep learning method. Nat Commun 2022;13:7705. [PMID: 36513636 PMCID: PMC9748135 DOI: 10.1038/s41467-022-35031-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 11/16/2022] [Indexed: 12/15/2022] Open

Feng X, Fang F, Long H, Zeng R, Yao Y. Single-cell RNA-seq data analysis using graph autoencoders and graph attention networks. Front Genet 2022;13:1003711. [PMID: 36568390 PMCID: PMC9780469 DOI: 10.3389/fgene.2022.1003711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 11/21/2022] [Indexed: 12/13/2022] Open

Shan Y, Yang J, Li X, Zhong X, Chang Y. GLAE: A Graph-learnable Auto-encoder for Single-cell RNA-seq Analysis. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022;20:814-835. [PMID: 36528240 PMCID: PMC10025684 DOI: 10.1016/j.gpb.2022.11.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]

Mondal AK, Asnani H, Singla P, Ap P. scRAE: Deterministic Regularized Autoencoders With Flexible Priors for Clustering Single-Cell Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2996-3007. [PMID: 34288873 DOI: 10.1109/tcbb.2021.3098394] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Li Z, Zhou X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol 2022;23:168. [PMID: 35927760 PMCID: PMC9351148 DOI: 10.1186/s13059-022-02734-7] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 07/21/2022] [Indexed: 02/08/2023] Open

Ding Q, Yang W, Luo M, Xu C, Xu Z, Pang F, Cai Y, Anashkina AA, Su X, Chen N, Jiang Q. CBLRR: a cauchy-based bounded constraint low-rank representation method to cluster single-cell RNA-seq data. Brief Bioinform 2022;23:6649282. [DOI: 10.1093/bib/bbac300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/17/2022] [Accepted: 07/02/2022] [Indexed: 11/14/2022] Open

Liu Q, Luo X, Li J, Wang G. scESI: evolutionary sparse imputation for single-cell transcriptomes from nearest neighbor cells. Brief Bioinform 2022;23:6580519. [PMID: 35512331 DOI: 10.1093/bib/bbac144] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 03/14/2022] [Accepted: 03/31/2022] [Indexed: 02/01/2023] Open

Wan H, Chen L, Deng M. scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data. Bioinformatics 2022;38:1575-1583. [PMID: 34999761 DOI: 10.1093/bioinformatics/btac011] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 11/28/2021] [Accepted: 01/05/2022] [Indexed: 02/03/2023] Open

Abstract

MOTIVATION

The rapid development of single-cell RNA sequencing (scRNA-seq) makes it possible to study the heterogeneity of individual cell characteristics. Cell clustering is a vital procedure in scRNA-seq analysis, providing insight into complex biological phenomena. However, the noisy, high-dimensional and large-scale nature of scRNA-seq data introduces challenges in clustering analysis. Up to now, many deep learning-based methods have emerged to learn underlying feature representations while clustering. However, these methods are inefficient when it comes to rare cell type identification and barely able to fully utilize gene dependencies or cell similarity integrally. As a result, they cannot detect a clear cell type structure which is required for clustering accuracy as well as downstream analysis.

RESULTS

Here, we propose a novel scRNA-seq clustering algorithm called scNAME which incorporates a mask estimation task for gene pertinence mining and a neighborhood contrastive learning framework for cell intrinsic structure exploitation. The learned pattern through mask estimation helps reveal uncorrupted data structure and denoise the original single-cell data. In addition, the randomly created augmented data introduced in contrastive learning not only helps improve robustness of clustering, but also increases sample size in each cluster for better data capacity. Beyond this, we also introduce a neighborhood contrastive paradigm with an offline memory bank, global in scope, which can inspire discriminative feature representation and achieve intra-cluster compactness, yet inter-cluster separation. The combination of mask estimation task, neighborhood contrastive learning and global memory bank designed in scNAME is conductive to rare cell type detection. The experimental results of both simulations and real data confirm that our method is accurate, robust and scalable. We also implement biological analysis, including marker gene identification, gene ontology and pathway enrichment analysis, to validate the biological significance of our method. To the best of our knowledge, we are among the first to introduce a gene relationship exploration strategy, as well as a global cellular similarity repository, in the single-cell field.

AVAILABILITY AND IMPLEMENTATION

An implementation of scNAME is available from https://github.com/aster-ww/scNAME.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Ciortan M, Defrance M. GNN-based embedding for clustering scRNA-seq data. Bioinformatics 2022;38:1037-1044. [PMID: 34850828 DOI: 10.1093/bioinformatics/btab787] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 10/15/2021] [Accepted: 11/15/2021] [Indexed: 02/03/2023] Open

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) provides transcriptomic profiling for individual cells, allowing researchers to study the heterogeneity of tissues, recognize rare cell identities and discover new cellular subtypes. Clustering analysis is usually used to predict cell class assignments and infer cell identities. However, the high sparsity of scRNA-seq data, accentuated by dropout events generates challenges that have motivated the development of numerous dedicated clustering methods. Nevertheless, there is still no consensus on the best performing method.

RESULTS

graph-sc is a new method leveraging a graph autoencoder network to create embeddings for scRNA-seq cell data. While this work analyzes the performance of clustering the embeddings with various clustering algorithms, other downstream tasks can also be performed. A broad experimental study has been performed on both simulated and scRNA-seq datasets. The results indicate that although there is no consistently best method across all the analyzed datasets, graph-sc compares favorably to competing techniques across all types of datasets. Furthermore, the proposed method is stable across consecutive runs, robust to input down-sampling, generally insensitive to changes in the network architecture or training parameters and more computationally efficient than other competing methods based on neural networks. Modeling the data as a graph provides increased flexibility to define custom features characterizing the genes, the cells and their interactions. Moreover, external data (e.g. gene network) can easily be integrated into the graph and used seamlessly under the same optimization task.

AVAILABILITY AND IMPLEMENTATION

https://github.com/ciortanmadalina/graph-sc.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse