1
|
Blampey Q, Bercovici N, Dutertre CA, Pic I, Ribeiro JM, André F, Cournède PH. A biology-driven deep generative model for cell-type annotation in cytometry. Brief Bioinform 2023; 24:bbad260. [PMID: 37497716 DOI: 10.1093/bib/bbad260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/20/2023] [Accepted: 06/27/2023] [Indexed: 07/28/2023] Open
Abstract
Cytometry enables precise single-cell phenotyping within heterogeneous populations. These cell types are traditionally annotated via manual gating, but this method lacks reproducibility and sensitivity to batch effect. Also, the most recent cytometers-spectral flow or mass cytometers-create rich and high-dimensional data whose analysis via manual gating becomes challenging and time-consuming. To tackle these limitations, we introduce Scyan https://github.com/MICS-Lab/scyan, a Single-cell Cytometry Annotation Network that automatically annotates cell types using only prior expert knowledge about the cytometry panel. For this, it uses a normalizing flow-a type of deep generative model-that maps protein expressions into a biologically relevant latent space. We demonstrate that Scyan significantly outperforms the related state-of-the-art models on multiple public datasets while being faster and interpretable. In addition, Scyan overcomes several complementary tasks, such as batch-effect correction, debarcoding and population discovery. Overall, this model accelerates and eases cell population characterization, quantification and discovery in cytometry.
Collapse
Affiliation(s)
- Quentin Blampey
- Université Paris-Saclay, CentraleSupélec, Laboratory of Mathematics and Computer Science (MICS), 3 rue Joliot Curie, 91190,Gif-sur-Yvette, France
| | - Nadège Bercovici
- Université Paris-Saclay, Gustave Roussy, Inserm U981, 114 Rue Edouard Vaillant, 94805, Villejuif, France
- Université Paris Cité, Institut Cochin, CNRS, Inserm, 22 Rue Méchain, 75014, Paris, France
| | - Charles-Antoine Dutertre
- Université Paris-Saclay, Gustave Roussy, Inserm U1015, 114 Rue Edouard Vaillant, 94805, Villejuif, France
| | - Isabelle Pic
- Université Paris-Saclay, Gustave Roussy, Inserm U981, 114 Rue Edouard Vaillant, 94805, Villejuif, France
| | - Joana Mourato Ribeiro
- Université Paris-Saclay, Gustave Roussy, Inserm U981, 114 Rue Edouard Vaillant, 94805, Villejuif, France
- Gustave Roussy, Département de Médecine Oncologique, 114 Rue Edouard Vaillant, 94805, Villejuif, France
| | - Fabrice André
- Université Paris-Saclay, Gustave Roussy, Inserm U981, 114 Rue Edouard Vaillant, 94805, Villejuif, France
- Gustave Roussy, Département de Médecine Oncologique, 114 Rue Edouard Vaillant, 94805, Villejuif, France
| | - Paul-Henry Cournède
- Université Paris-Saclay, CentraleSupélec, Laboratory of Mathematics and Computer Science (MICS), 3 rue Joliot Curie, 91190,Gif-sur-Yvette, France
| |
Collapse
|
2
|
Wan H, Chen L, Deng M. scEMAIL: Universal and Source-free Annotation Method for scRNA-seq Data with Novel Cell-type Perception. Genomics Proteomics Bioinformatics 2022; 20:939-958. [PMID: 36608843 PMCID: PMC10025768 DOI: 10.1016/j.gpb.2022.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 11/30/2022] [Accepted: 12/11/2022] [Indexed: 01/05/2023]
Abstract
Current cell-type annotation tools for single-cell RNA sequencing (scRNA-seq) data mainly utilize well-annotated source data to help identify cell types in target data. However, on account of privacy preservation, their requirements for raw source data may not always be satisfied. In this case, achieving feature alignment between source and target data explicitly is impossible. Additionally, these methods are barely able to discover the presence of novel cell types. A subjective threshold is often selected by users to detect novel cells. We propose a universal annotation framework for scRNA-seq data called scEMAIL, which automatically detects novel cell types without accessing source data during adaptation. For new cell-type identification, a novel cell-type perception module is designed with three steps. First, an expert ensemble system measures uncertainty of each cell from three complementary aspects. Second, based on this measurement, bimodality tests are applied to detect the presence of new cell types. Third, once assured of their presence, an adaptive threshold via manifold mixup partitions target cells into "known" and "unknown" groups. Model adaptation is then conducted to alleviate the batch effect. We gather multi-order neighborhood messages globally and impose local affinity regularizations on "known" cells. These constraints mitigate wrong classifications of the source model via reliable self-supervised information of neighbors. scEMAIL is accurate and robust under various scenarios in both simulation and real data. It is also flexible to be applied to challenging single-cell ATAC-seq data without loss of superiority. The source code of scEMAIL can be accessed at https://github.com/aster-ww/scEMAIL and https://ngdc.cncb.ac.cn/biocode/tools/BT007335/releases/v1.0.
Collapse
Affiliation(s)
- Hui Wan
- School of Mathematical Sciences, Peking University, Beijing 100871, China
| | - Liang Chen
- Huawei Technologies Co., Ltd., Beijing 100080, China.
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing 100871, China; Center for Statistical Science, Peking University, Beijing 100871, China; Center for Quantitative Biology, Peking University, Beijing 100871, China.
| |
Collapse
|
3
|
Wang C, Sun D, Huang X, Wan C, Li Z, Han Y, Qin Q, Fan J, Qiu X, Xie Y, Meyer CA, Brown M, Tang M, Long H, Liu T, Liu XS. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol 2020; 21:198. [PMID: 32767996 PMCID: PMC7412809 DOI: 10.1186/s13059-020-02116-x] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 07/23/2020] [Indexed: 12/15/2022] Open
Abstract
We present Model-based AnalysEs of Transcriptome and RegulOme (MAESTRO), a comprehensive open-source computational workflow ( http://github.com/liulab-dfci/MAESTRO ) for the integrative analyses of single-cell RNA-seq (scRNA-seq) and ATAC-seq (scATAC-seq) data from multiple platforms. MAESTRO provides functions for pre-processing, alignment, quality control, expression and chromatin accessibility quantification, clustering, differential analysis, and annotation. By modeling gene regulatory potential from chromatin accessibilities at the single-cell level, MAESTRO outperforms the existing methods for integrating the cell clusters between scRNA-seq and scATAC-seq. Furthermore, MAESTRO supports automatic cell-type annotation using predefined cell type marker genes and identifies driver regulators from differential scRNA-seq genes and scATAC-seq peaks.
Collapse
Affiliation(s)
- Chenfei Wang
- Department of Data Science, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Dongqing Sun
- Clinical Translational Research Center, Shanghai Pulmonary Hospital, School of Life Science and Technology, Tongji University, Shanghai, 200433, China
| | - Xin Huang
- Beijing Institute of Radiation Medicine, Beijing, 100850, China
| | - Changxin Wan
- Clinical Translational Research Center, Shanghai Pulmonary Hospital, School of Life Science and Technology, Tongji University, Shanghai, 200433, China
| | - Ziyi Li
- Clinical Translational Research Center, Shanghai Pulmonary Hospital, School of Life Science and Technology, Tongji University, Shanghai, 200433, China
| | - Ya Han
- Clinical Translational Research Center, Shanghai Pulmonary Hospital, School of Life Science and Technology, Tongji University, Shanghai, 200433, China
| | - Qian Qin
- Clinical Translational Research Center, Shanghai Pulmonary Hospital, School of Life Science and Technology, Tongji University, Shanghai, 200433, China
| | - Jingyu Fan
- Clinical Translational Research Center, Shanghai Pulmonary Hospital, School of Life Science and Technology, Tongji University, Shanghai, 200433, China
| | - Xintao Qiu
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA
| | - Yingtian Xie
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA
| | - Clifford A Meyer
- Department of Data Science, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Myles Brown
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA
| | - Ming Tang
- Department of Data Science, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Henry Long
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA
| | - Tao Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| | - X Shirley Liu
- Department of Data Science, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA.
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
| |
Collapse
|