Yan F, Zhao Z, Simon LM. EmptyNN: A neural network based on positive and unlabeled learning to remove cell-free droplets and recover lost cells in scRNA-seq data.
Patterns (N Y) 2021;
2:100311. [PMID:
34430929 PMCID:
PMC8369248 DOI:
10.1016/j.patter.2021.100311]
[Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 06/04/2021] [Accepted: 06/18/2021] [Indexed: 11/26/2022]
Abstract
Droplet-based single-cell RNA sequencing (scRNA-seq) has significantly increased the number of cells profiled per experiment and revolutionized the study of individual transcriptomes. However, to maximize the biological signal, robust computational methods are needed to distinguish cell-free from cell-containing droplets. Here, we introduce a novel cell-calling algorithm called EmptyNN, which trains a neural network based on positive-unlabeled learning for improved filtering of barcodes. For benchmarking purposes, we leveraged cell hashing and genetic variation to provide ground truth. EmptyNN accurately removed cell-free droplets while recovering lost cell clusters, and achieved an area under the receiver operating characteristics of 94.73% and 96.30%, respectively. Comparisons to current state-of-the-art cell-calling algorithms demonstrated the superior performance of EmptyNN. EmptyNN was further applied to a single-nucleus RNA sequencing (snRNA-seq) dataset and showed good performance. Therefore, EmptyNN represents a powerful tool to enhance both scRNA-seq and snRNA-seq quality control analyses.
The novel cell-calling algorithm EmptyNN improves the quality of scRNA-seq datasets
EmptyNN accurately removes cell-free droplets and recovers genuine cells
Benchmarking analyses leverage cell hashing information and genetic variation
Advances in measuring gene expression at the cellular level at high throughput have been fueled by the advent of droplet-based single-cell RNA sequencing (scRNA-seq) platforms. Droplet-based scRNA-seq platforms profile a large number of cells per experiment and accelerate our understanding of biology. Accurate classification of cell-free and cell-containing droplets will maximize biological signal and facilitate downstream analysis. Here, we present a novel cell-calling algorithm called EmptyNN, which trains a neural network based on positive-unlabeled learning for improved filtering of barcodes. Our results indicate that EmptyNN outperforms existing cell-calling methods and, thus, represents a powerful tool to enhance both scRNA-seq and single-nucleus RNA sequencing quality control analyses.
Collapse