Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Duan T, Pinto JP, Xie X. Parallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixtures. Bioinformatics 2019;35:953-961. [PMID: 30165477 DOI: 10.1093/bioinformatics/bty702] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 07/20/2018] [Accepted: 08/22/2018] [Indexed: 11/14/2022] Open

For:	Duan T, Pinto JP, Xie X. Parallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixtures. Bioinformatics 2019;35:953-961. [PMID: 30165477 DOI: 10.1093/bioinformatics/bty702] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 07/20/2018] [Accepted: 08/22/2018] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Tadi AA, Alhadidi D, Rueda L. PPPCT: Privacy-Preserving framework for Parallel Clustering Transcriptomics data. Comput Biol Med 2024;173:108351. [PMID: 38520921 DOI: 10.1016/j.compbiomed.2024.108351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/18/2024] [Accepted: 03/18/2024] [Indexed: 03/25/2024]

Abstract

Single-cell transcriptomics data provides crucial insights into patients' health, yet poses significant privacy concerns. Genomic data privacy attacks can have deep implications, encompassing not only the patients' health information but also extending widely to compromise their families'. Moreover, the permanence of leaked data exacerbates the challenges, making retraction an impossibility. While extensive efforts have been directed towards clustering single-cell transcriptomics data, addressing critical challenges, especially in the realm of privacy, remains pivotal. This paper introduces an efficient, fast, privacy-preserving approach for clustering single-cell RNA-sequencing (scRNA-seq) datasets. The key contributions include ensuring data privacy, achieving high-quality clustering, accommodating the high dimensionality inherent in the datasets, and maintaining reasonable computation time for big-scale datasets. Our proposed approach utilizes the map-reduce scheme to parallelize clustering, addressing intensive calculation challenges. Intel Software Guard eXtension (SGX) processors are used to ensure the security of sensitive code and data during processing. Additionally, the approach incorporates a logarithm transformation as a preprocessing step, employs non-negative matrix factorization for dimensionality reduction, and utilizes parallel k-means for clustering. The approach fully leverages the computing capabilities of all processing resources within a secure private cloud environment. Experimental results demonstrate the efficacy of our approach in preserving patient privacy while surpassing state-of-the-art methods in both clustering quality and computation time. Our method consistently achieves a minimum of 7% higher Adjusted Rand Index (ARI) than existing approaches, contingent on dataset size. Additionally, due to parallel computations and dimensionality reduction, our approach exhibits efficiency, converging to very good results in less than 10 seconds for a scRNA-seq dataset with 5000 genes and 6000 cells when prioritizing privacy and under two seconds without privacy considerations. Availability and implementation Code and datasets availability: https://github.com/University-of-Windsor/PPPCT.

Collapse

Nwizu C, Hughes M, Ramseier ML, Navia AW, Shalek AK, Fusi N, Raghavan S, Winter PS, Amini AP, Crawford L. Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.11.579839. [PMID: 38405697 PMCID: PMC10888887 DOI: 10.1101/2024.02.11.579839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]

Affiliation(s)

Chibuikem Nwizu Center for Computational Molecular Biology, Brown University, Providence, RI, USA Warren Alpert Medical School of Brown University, Providence, RI, USA
Madeline Hughes Microsoft Research, Cambridge, MA, USA
Michelle L. Ramseier Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
Andrew W. Navia Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
Alex K. Shalek Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA Harvard Medical School, Boston, MA, USA Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
Nicolo Fusi Microsoft Research, Cambridge, MA, USA
Srivatsan Raghavan Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA Harvard Medical School, Boston, MA, USA Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Peter S. Winter Broad Institute of MIT and Harvard, Cambridge, MA, USA Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
Ava P. Amini Microsoft Research, Cambridge, MA, USA
Lorin Crawford Center for Computational Molecular Biology, Brown University, Providence, RI, USA Microsoft Research, Cambridge, MA, USA Department of Biostatistics, Brown University, Providence, RI, USA

Collapse

Wilson T, Vo DHT, Thorne T. Identifying Subpopulations of Cells in Single-Cell Transcriptomic Data: A Bayesian Mixture Modeling Approach to Zero Inflation of Counts. J Comput Biol 2023;30:1059-1074. [PMID: 37871291 DOI: 10.1089/cmb.2022.0273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Open

Wade S. Bayesian cluster analysis. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023;381:20220149. [PMID: 36970819 PMCID: PMC10041359 DOI: 10.1098/rsta.2022.0149] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 01/03/2023] [Indexed: 06/18/2023]

Baghdadi A, Manouchehri N, Patterson Z, Fan W, Bouguila N. Hierarchical Dirichlet and Pitman–Yor process mixtures of shifted‐scaled Dirichlet distributions for proportional data modeling. Comput Intell 2022. [DOI: 10.1111/coin.12558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Mirzal A. Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:1173-1192. [PMID: 32956065 DOI: 10.1109/tcbb.2020.3025486] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Ganguli I, Sil J, Sengupta N. Nonparametric method of topic identification using granularity concept and graph-based modeling. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05662-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Hie B, Peters J, Nyquist SK, Shalek AK, Berger B, Bryson BD. Computational Methods for Single-Cell RNA Sequencing. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012220-100601] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Zeng T, Dai H. Single-Cell RNA Sequencing-Based Computational Analysis to Describe Disease Heterogeneity. Front Genet 2019;10:629. [PMID: 31354786 PMCID: PMC6640157 DOI: 10.3389/fgene.2019.00629] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 06/17/2019] [Indexed: 12/25/2022] Open

Chen S, Hua K, Cui H, Jiang R. VPAC: Variational projection for accurate clustering of single-cell transcriptomic data. BMC Bioinformatics 2019;20:0. [PMID: 31074382 PMCID: PMC6509870 DOI: 10.1186/s12859-019-2742-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Abstract

Background

Single-cell RNA-sequencing (scRNA-seq) technologies have advanced rapidly in recent years and enabled the quantitative characterization at a microscopic resolution. With the exponential growth of the number of cells profiled in individual scRNA-seq experiments, the demand for identifying putative cell types from the data has become a great challenge that appeals for novel computational methods. Although a variety of algorithms have recently been proposed for single-cell clustering, such limitations as low accuracy, inferior robustness, and inadequate stability greatly impede the scope of applications of these methods.

Results

We propose a novel model-based algorithm, named VPAC, for accurate clustering of single-cell transcriptomic data through variational projection, which assumes that single-cell samples follow a Gaussian mixture distribution in a latent space. Through comprehensive validation experiments, we demonstrate that VPAC can not only be applied to datasets of discrete counts and normalized continuous data, but also scale up well to various data dimensionality, different dataset size and different data sparsity. We further illustrate the ability of VPAC to detect genes with strong unique signatures of a specific cell type, which may shed light on the studies in system biology. We have released a user-friendly python package of VPAC in Github (https://github.com/ShengquanChen/VPAC). Users can directly import our VPAC class and conduct clustering without tedious installation of dependency packages.

Conclusions

VPAC enables highly accurate clustering of single-cell transcriptomic data via a statistical model. We expect to see wide applications of our method to not only transcriptome studies for fully understanding the cell identity and functionality, but also the clustering of more general data.

Collapse