1
|
Schiebout C, Frost HR. CAraCAl: CAMML with the integration of chromatin accessibility. BMC Bioinformatics 2024; 25:212. [PMID: 38872103 PMCID: PMC11170880 DOI: 10.1186/s12859-024-05833-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 06/10/2024] [Indexed: 06/15/2024] Open
Abstract
BACKGROUND A vital step in analyzing single-cell data is ascertaining which cell types are present in a dataset, and at what abundance. In many diseases, the proportions of varying cell types can have important implications for health and prognosis. Most approaches for cell type annotation have centered around cell typing for single-cell RNA-sequencing (scRNA-seq) and have had promising success. However, reliable methods are lacking for many other single-cell modalities such as single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq), which quantifies the extent to which genes of interest in each cell are epigenetically "open" for expression. RESULTS To leverage the informative potential of scATAC-seq data, we developed CAMML with the integration of chromatin accessibility (CAraCAl), a bioinformatic method that performs cell typing on scATAC-seq data. CAraCAl performs cell typing by scoring each cell for its enrichment of cell type-specific gene sets. These gene sets are composed of the most upregulated or downregulated genes present in each cell type according to projected gene activity. CONCLUSIONS We found that CAraCAl does not improve performance beyond CAMML when scRNA-seq is present, but if only scATAC-seq is available, CAraCAl performs cell typing relatively successfully. As such, we also discuss best practices for cell typing and the strengths and weaknesses of various cell annotation options.
Collapse
Affiliation(s)
- Courtney Schiebout
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH, 03766, USA.
| | - H Robert Frost
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH, 03766, USA
| |
Collapse
|
2
|
Hedayat S, Cascione L, Cunningham D, Schirripa M, Lampis A, Hahne JC, Tunariu N, Hong SP, Marchetti S, Khan K, Fontana E, Angerilli V, Delrieux M, Nava Rodrigues D, Procaccio L, Rao S, Watkins D, Starling N, Chau I, Braconi C, Fotiadis N, Begum R, Guppy N, Howell L, Valenti M, Cribbes S, Kolozsvari B, Kirkin V, Lonardi S, Ghidini M, Passalacqua R, Elghadi R, Magnani L, Pinato DJ, Di Maggio F, Ghelardi F, Sottotetti E, Vetere G, Ciracì P, Vlachogiannis G, Pietrantonio F, Cremolini C, Cortellini A, Loupakis F, Fassan M, Valeri N. Circulating microRNA Analysis in a Prospective Co-clinical Trial Identifies MIR652-3p as a Response Biomarker and Driver of Regorafenib Resistance Mechanisms in Colorectal Cancer. Clin Cancer Res 2024; 30:2140-2159. [PMID: 38376926 DOI: 10.1158/1078-0432.ccr-23-2748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024]
Abstract
PURPOSE The multi-kinase inhibitor (mKi) regorafenib has demonstrated efficacy in chemorefractory patients with metastatic colorectal cancer (mCRC). However, lack of predictive biomarkers and concerns over significant toxicities hamper the use of regorafenib in clinical practice. EXPERIMENTAL DESIGN Serial liquid biopsies were obtained at baseline and monthly until disease progression in chemorefractory patients with mCRC treated with regorafenib in a phase II clinical trial (PROSPECT-R n = 40; NCT03010722) and in a multicentric validation cohort (n = 241). Tissue biopsies collected at baseline, after 2 months and at progression in the PROSPECT-R trial were used to establish patient-derived organoids (PDO) and for molecular analyses. MicroRNA profiling was performed on baseline bloods using the NanoString nCounter platform and results were validated by digital-droplet PCR and/or ISH in paired liquid and tissue biopsies. PDOs co-cultures and PDO-xenotransplants were generated for functional analyses. RESULTS Large-scale microRNA expression analysis in longitudinal matched liquid and tissue biopsies from the PROSPECT-R trial identified MIR652-3p as a biomarker of clinical benefit to regorafenib. These findings were confirmed in an independent validation cohort and in a "control" group of 100 patients treated with lonsurf. Using ex vivo co-culture assays paired with single-cell RNA-sequencing of PDO established pre- and post-treatment, we modeled regorafenib response observed in vivo and in patients, and showed that MIR652-3p controls resistance to regorafenib by impairing regorafenib-induced lethal autophagy and by orchestrating the switch from neo-angiogenesis to vessel co-option. CONCLUSIONS Our results identify MIR652-3p as a potential biomarker and as a driver of cell and non-cell-autonomous mechanisms of resistance to regorafenib.
Collapse
Affiliation(s)
- Somaieh Hedayat
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
| | - Luciano Cascione
- Bioinformatics Core Unit, Institute of Oncology Research (IOR), Faculty of Biomedical Sciences, Università della Svizzera italiana, Bellinzona, Switzerland
- Swiss Institute of Bioinformatics, Bellinzona, Switzerland
| | - David Cunningham
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
| | - Marta Schirripa
- Istituto Oncologico Veneto, Istituto di Ricovero e Cura a Carattere Scientifico, Padua, Italy
| | - Andrea Lampis
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
| | - Jens C Hahne
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
| | - Nina Tunariu
- Department of Radiology, The Royal Marsden Hospital, London and Sutton, United Kingdom
| | - Sung Pil Hong
- Division of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Silvia Marchetti
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
| | - Khurum Khan
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
| | - Elisa Fontana
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
| | - Valentina Angerilli
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
- Department of Medicine, Surgical Pathology Unit, University of Padua, Padua, Italy
| | - Mia Delrieux
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
| | - Daniel Nava Rodrigues
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
| | - Letizia Procaccio
- Istituto Oncologico Veneto, Istituto di Ricovero e Cura a Carattere Scientifico, Padua, Italy
| | - Sheela Rao
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
| | - David Watkins
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
| | - Naureen Starling
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
| | - Ian Chau
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
| | - Chiara Braconi
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, United Kingdom
- Institute of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Nicos Fotiadis
- Department of Interventional Radiology, The Royal Marsden Hospital, London, United Kingdom
| | - Ruwaida Begum
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
| | - Naomy Guppy
- Breast Cancer Now Nina Barough Pathology Core Facility, The Institute of Cancer Research, London, United Kingdom
| | - Louise Howell
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
| | - Melanie Valenti
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, United Kingdom
| | | | | | - Vladimir Kirkin
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, United Kingdom
| | - Sara Lonardi
- Istituto Oncologico Veneto, Istituto di Ricovero e Cura a Carattere Scientifico, Padua, Italy
| | - Michele Ghidini
- Oncology Unit, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
| | | | - Raghad Elghadi
- Division of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Luca Magnani
- Division of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - David J Pinato
- Division of Surgery and Cancer, Imperial College London, London, United Kingdom
- Division of Oncology, Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy
| | - Federica Di Maggio
- Division of Surgery and Cancer, Imperial College London, London, United Kingdom
- Department of Molecular Medicine and Medical Biotechnologies, University of Naples Federico II, Naples, Italy
- CEINGE-Biotecnologie Avanzate Francesco Salvatore, Via Gaetano Salvatore, Naples, Italy
| | - Filippo Ghelardi
- Medical Oncology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Elisa Sottotetti
- Medical Oncology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Guglielmo Vetere
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Paolo Ciracì
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Georgios Vlachogiannis
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
- Division of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Filippo Pietrantonio
- Medical Oncology Department, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Chiara Cremolini
- Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Alessio Cortellini
- Division of Surgery and Cancer, Imperial College London, London, United Kingdom
- Medical Oncology, Fondazione Policlinico Universitario Campus Bio-Medico, Rome, Italy
| | - Fotios Loupakis
- Istituto Oncologico Veneto, Istituto di Ricovero e Cura a Carattere Scientifico, Padua, Italy
| | - Matteo Fassan
- Istituto Oncologico Veneto, Istituto di Ricovero e Cura a Carattere Scientifico, Padua, Italy
- Department of Medicine, Surgical Pathology Unit, University of Padua, Padua, Italy
| | - Nicola Valeri
- Division of Molecular Pathology and Centre for Evolution and Cancer, The Institute of Cancer Research, London, United Kingdom
- Department of Medicine, The Royal Marsden Hospital, London and Sutton, United Kingdom
- Division of Surgery and Cancer, Imperial College London, London, United Kingdom
| |
Collapse
|
3
|
He S, Gubin MM, Rafei H, Basar R, Dede M, Jiang X, Liang Q, Tan Y, Kim K, Gillison ML, Rezvani K, Peng W, Haymaker C, Hernandez S, Solis LM, Mohanty V, Chen K. Elucidating immune-related gene transcriptional programs via factorization of large-scale RNA-profiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.10.593433. [PMID: 38798470 PMCID: PMC11118452 DOI: 10.1101/2024.05.10.593433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Recent developments in immunotherapy, including immune checkpoint blockade (ICB) and adoptive cell therapy, have encountered challenges such as immune-related adverse events and resistance, especially in solid tumors. To advance the field, a deeper understanding of the molecular mechanisms behind treatment responses and resistance is essential. However, the lack of functionally characterized immune-related gene sets has limited data-driven immunological research. To address this gap, we adopted non-negative matrix factorization on 83 human bulk RNA-seq datasets and constructed 28 immune-specific gene sets. After rigorous immunologist-led manual annotations and orthogonal validations across immunological contexts and functional omics data, we demonstrated that these gene sets can be applied to refine pan-cancer immune subtypes, improve ICB response prediction and functionally annotate spatial transcriptomic data. These functional gene sets, informing diverse immune states, will advance our understanding of immunology and cancer research.
Collapse
|
4
|
Ediriwickrema A, Nakauchi Y, Fan AC, Köhnke T, Hu X, Luca BA, Kim Y, Ramakrishnan S, Nakamoto M, Karigane D, Linde MH, Azizi A, Newman AM, Gentles AJ, Majeti R. A single cell framework identifies functionally and molecularly distinct multipotent progenitors in adult human hematopoiesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.07.592983. [PMID: 38766031 PMCID: PMC11100686 DOI: 10.1101/2024.05.07.592983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Hematopoietic multipotent progenitors (MPPs) regulate blood cell production to appropriately meet the biological demands of the human body. Human MPPs remain ill-defined whereas mouse MPPs have been well characterized with distinct immunophenotypes and lineage potencies. Using multiomic single cell analyses and complementary functional assays, we identified new human MPPs and oligopotent progenitor populations within Lin-CD34+CD38dim/lo adult bone marrow with distinct biomolecular and functional properties. These populations were prospectively isolated based on expression of CD69, CLL1, and CD2 in addition to classical markers like CD90 and CD45RA. We show that within the canonical Lin-CD34+CD38dim/loCD90CD45RA-MPP population, there is a CD69+ MPP with long-term engraftment and multilineage differentiation potential, a CLL1+ myeloid-biased MPP, and a CLL1-CD69-erythroid-biased MPP. We also show that the canonical Lin-CD34+CD38dim/loCD90-CD45RA+ LMPP population can be separated into a CD2+ LMPP with lymphoid and myeloid potential, a CD2-LMPP with high lymphoid potential, and a CLL1+ GMP with minimal lymphoid potential. We used these new HSPC profiles to study human and mouse bone marrow cells and observe limited cell type specific homology between humans and mice and cell type specific changes associated with aging. By identifying and functionally characterizing new adult MPP sub-populations, we provide an updated reference and framework for future studies in human hematopoiesis.
Collapse
|
5
|
Wang Y, Flowers CR, Wang M, Huang X, Li Z. CASi: A framework for cross-timepoint analysis of single-cell RNA sequencing data. Sci Rep 2024; 14:10633. [PMID: 38724550 PMCID: PMC11082156 DOI: 10.1038/s41598-024-58566-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 04/01/2024] [Indexed: 05/12/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technology has been widely used to study the differences in gene expression at the single cell level, providing insights into the research of cell development, differentiation, and functional heterogeneity. Various pipelines and workflows of scRNA-seq analysis have been developed but few considered multi-timepoint data specifically. In this study, we develop CASi, a comprehensive framework for analyzing multiple timepoints' scRNA-seq data, which provides users with: (1) cross-timepoint cell annotation, (2) detection of potentially novel cell types emerged over time, (3) visualization of cell population evolution, and (4) identification of temporal differentially expressed genes (tDEGs). Through comprehensive simulation studies and applications to a real multi-timepoint single cell dataset, we demonstrate the robust and favorable performance of the proposal versus existing methods serving similar purposes.
Collapse
Affiliation(s)
- Yizhuo Wang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 77030, USA
| | - Christopher R Flowers
- Department of Lymphoma/Myeloma, The University of Texas MD Anderson Cancer Center, Houston, 77030, USA
| | - Michael Wang
- Department of Lymphoma/Myeloma, The University of Texas MD Anderson Cancer Center, Houston, 77030, USA
| | - Xuelin Huang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 77030, USA.
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 77030, USA.
| |
Collapse
|
6
|
Dong S, Deng K, Huang X. Single-cell type annotation with deep learning in 265 cell types for humans. BIOINFORMATICS ADVANCES 2024; 4:vbae054. [PMID: 38645719 PMCID: PMC11031354 DOI: 10.1093/bioadv/vbae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/17/2024] [Accepted: 04/05/2024] [Indexed: 04/23/2024]
Abstract
Motivation Annotating cell types is a challenging yet essential task in analyzing single-cell RNA sequencing data. However, due to the lack of a gold standard, it is difficult to evaluate the algorithms fairly and an overfitting algorithm may be favored in benchmarks. To address this challenge, we developed a deep learning-based single-cell type prediction tool that assigns the cell type to 265 different cell types for humans, based on data from approximately five million cells. Results We achieved a median area under the ROC curve (AUC) of 0.93 when evaluated across datasets. We found that inconsistent labeling in the existing database generated by different labs contributed to the mistakes of the model. Therefore, we used cell ontology to correct the annotations and retrained the model, which resulted in 0.971 median AUC. Our study reveals a limiting factor of the accuracy one may achieve with the current database annotation and points to the solutions towards an algorithm-based correction of the gold standard for future automated cell annotation approaches. Availability and implementation The code is available at: https://github.com/SherrySDong/Hierarchical-Correction-Improves-Automated-Single-cell-Type-Annotation. Data used in this study are listed in Supplementary Table S1 and are retrievable at the CZI database.
Collapse
Affiliation(s)
- Sherry Dong
- Skyline High School, Ann Arbor, MI 48103, United States
- National AI Campus and Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA 90069, United States
| | - Kaiwen Deng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Xiuzhen Huang
- National AI Campus and Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA 90069, United States
| |
Collapse
|
7
|
Huang X, Liu R, Yang S, Chen X, Li H. scAnnoX: an R package integrating multiple public tools for single-cell annotation. PeerJ 2024; 12:e17184. [PMID: 38560451 PMCID: PMC10981883 DOI: 10.7717/peerj.17184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 03/11/2024] [Indexed: 04/04/2024] Open
Abstract
Background Single-cell annotation plays a crucial role in the analysis of single-cell genomics data. Despite the existence of numerous single-cell annotation algorithms, a comprehensive tool for integrating and comparing these algorithms is also lacking. Methods This study meticulously investigated a plethora of widely adopted single-cell annotation algorithms. Ten single-cell annotation algorithms were selected based on the classification of either reference dataset-dependent or marker gene-dependent approaches. These algorithms included SingleR, Seurat, sciBet, scmap, CHETAH, scSorter, sc.type, cellID, scCATCH, and SCINA. Building upon these algorithms, we developed an R package named scAnnoX for the integration and comparative analysis of single-cell annotation algorithms. Results The development of the scAnnoX software package provides a cohesive framework for annotating cells in scRNA-seq data, enabling researchers to more efficiently perform comparative analyses among the cell type annotations contained in scRNA-seq datasets. The integrated environment of scAnnoX streamlines the testing, evaluation, and comparison processes among various algorithms. Among the ten annotation tools evaluated, SingleR, Seurat, sciBet, and scSorter emerged as top-performing algorithms in terms of prediction accuracy, with SingleR and sciBet demonstrating particularly superior performance, offering guidance for users. Interested parties can access the scAnnoX package at https://github.com/XQ-hub/scAnnoX.
Collapse
Affiliation(s)
- Xiaoqian Huang
- School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan Province, China
| | - Ruiqi Liu
- School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan Province, China
| | - Shiwei Yang
- School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan Province, China
| | - Xiaozhou Chen
- School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan Province, China
| | - Huamei Li
- Department of Hepatobiliary Surgery, the Affiliated Drum Tower Hospital, Medical School, Nanjing University, Nanjing, Jiangsu Province, China
| |
Collapse
|
8
|
Theunissen L, Mortier T, Saeys Y, Waegeman W. Uncertainty-aware single-cell annotation with a hierarchical reject option. Bioinformatics 2024; 40:btae128. [PMID: 38441258 PMCID: PMC10957513 DOI: 10.1093/bioinformatics/btae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 02/23/2024] [Accepted: 03/01/2024] [Indexed: 03/23/2024] Open
Abstract
MOTIVATION Automatic cell type annotation methods assign cell type labels to new datasets by extracting relationships from a reference RNA-seq dataset. However, due to the limited resolution of gene expression features, there is always uncertainty present in the label assignment. To enhance the reliability and robustness of annotation, most machine learning methods address this uncertainty by providing a full reject option, i.e. when the predicted confidence score of a cell type label falls below a user-defined threshold, no label is assigned and no prediction is made. As a better alternative, some methods deploy hierarchical models and consider a so-called partial rejection by returning internal nodes of the hierarchy as label assignment. However, because a detailed experimental analysis of various rejection approaches is missing in the literature, there is currently no consensus on best practices. RESULTS We evaluate three annotation approaches (i) full rejection, (ii) partial rejection, and (iii) no rejection for both flat and hierarchical probabilistic classifiers. Our findings indicate that hierarchical classifiers are superior when rejection is applied, with partial rejection being the preferred rejection approach, as it preserves a significant amount of label information. For optimal rejection implementation, the rejection threshold should be determined through careful examination of a method's rejection behavior. Without rejection, flat and hierarchical annotation perform equally well, as long as the cell type hierarchy accurately captures transcriptomic relationships. AVAILABILITY AND IMPLEMENTATION Code is freely available at https://github.com/Latheuni/Hierarchical_reject and https://doi.org/10.5281/zenodo.10697468.
Collapse
Affiliation(s)
- Lauren Theunissen
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Thomas Mortier
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Willem Waegeman
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| |
Collapse
|
9
|
Gong M, Yu Y, Wang Z, Zhang J, Wang X, Fu C, Zhang Y, Wang X. scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis. Comput Biol Med 2024; 171:108230. [PMID: 38442554 DOI: 10.1016/j.compbiomed.2024.108230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/06/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT's methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence's grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.
Collapse
Affiliation(s)
- Meiqin Gong
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China
| | - Yun Yu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Zixuan Wang
- College of Electronics and information Engineering, SiChuan University, Chengdu, 610065, China
| | - Junming Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiongyi Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Cheng Fu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiaodong Wang
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
10
|
Choi JM, Park C, Chae H. moSCminer: a cell subtype classification framework based on the attention neural network integrating the single-cell multi-omics dataset on the cloud. PeerJ 2024; 12:e17006. [PMID: 38426141 PMCID: PMC10903350 DOI: 10.7717/peerj.17006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/05/2024] [Indexed: 03/02/2024] Open
Abstract
Single-cell omics sequencing has rapidly advanced, enabling the quantification of diverse omics profiles at a single-cell resolution. To facilitate comprehensive biological insights, such as cellular differentiation trajectories, precise annotation of cell subtypes is essential. Conventional methods involve clustering cells and manually assigning subtypes based on canonical markers, a labor-intensive and expert-dependent process. Hence, an automated computational prediction framework is crucial. While several classification frameworks for predicting cell subtypes from single-cell RNA sequencing datasets exist, these methods solely rely on single-omics data, offering insights at a single molecular level. They often miss inter-omic correlations and a holistic understanding of cellular processes. To address this, the integration of multi-omics datasets from individual cells is essential for accurate subtype annotation. This article introduces moSCminer, a novel framework for classifying cell subtypes that harnesses the power of single-cell multi-omics sequencing datasets through an attention-based neural network operating at the omics level. By integrating three distinct omics datasets-gene expression, DNA methylation, and DNA accessibility-while accounting for their biological relationships, moSCminer excels at learning the relative significance of each omics feature. It then transforms this knowledge into a novel representation for cell subtype classification. Comparative evaluations against standard machine learning-based classifiers demonstrate moSCminer's superior performance, consistently achieving the highest average performance on real datasets. The efficacy of multi-omics integration is further corroborated through an in-depth analysis of the omics-level attention module, which identifies potential markers for cell subtype annotation. To enhance accessibility and scalability, moSCminer is accessible as a user-friendly web-based platform seamlessly connected to a cloud system, publicly accessible at http://203.252.206.118:5568. Notably, this study marks the pioneering integration of three single-cell multi-omics datasets for cell subtype identification.
Collapse
Affiliation(s)
- Joung Min Choi
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, Virginia, United States
| | - Chaelin Park
- Division of Computer Science, Sookmyung Women’s University, Seoul, South Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women’s University, Seoul, South Korea
| |
Collapse
|
11
|
Sun H, Qu H, Duan K, Du W. scMGCN: A Multi-View Graph Convolutional Network for Cell Type Identification in scRNA-seq Data. Int J Mol Sci 2024; 25:2234. [PMID: 38396909 PMCID: PMC10889820 DOI: 10.3390/ijms25042234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/07/2024] [Accepted: 02/09/2024] [Indexed: 02/25/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) data reveal the complexity and diversity of cellular ecosystems and molecular interactions in various biomedical research. Hence, identifying cell types from large-scale scRNA-seq data using existing annotations is challenging and requires stable and interpretable methods. However, the current cell type identification methods have limited performance, mainly due to the intrinsic heterogeneity among cell populations and extrinsic differences between datasets. Here, we present a robust graph artificial intelligence model, a multi-view graph convolutional network model (scMGCN) that integrates multiple graph structures from raw scRNA-seq data and applies graph convolutional networks with attention mechanisms to learn cell embeddings and predict cell labels. We evaluate our model on single-dataset, cross-species, and cross-platform experiments and compare it with other state-of-the-art methods. Our results show that scMGCN outperforms the other methods regarding stability, accuracy, and robustness to batch effects. Our main contributions are as follows: Firstly, we introduce multi-view learning and multiple graph construction methods to capture comprehensive cellular information from scRNA-seq data. Secondly, we construct a scMGCN that combines graph convolutional networks with attention mechanisms to extract shared, high-order information from cells. Finally, we demonstrate the effectiveness and superiority of the scMGCN on various datasets.
Collapse
Affiliation(s)
| | | | | | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (H.S.); (H.Q.); (K.D.)
| |
Collapse
|
12
|
Chen Z, Miao Y, Tan Z, Hu Q, Wu Y, Li X, Guo W, Gu J. scCancer2: data-driven in-depth annotations of the tumor microenvironment at single-level resolution. Bioinformatics 2024; 40:btae028. [PMID: 38243719 PMCID: PMC10868330 DOI: 10.1093/bioinformatics/btae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 01/10/2024] [Accepted: 01/15/2024] [Indexed: 01/21/2024] Open
Abstract
SUMMARY Single-cell RNA-seq (scRNA-seq) is a powerful technique for decoding the complex cellular compositions in the tumor microenvironment (TME). As previous studies have defined many meaningful cell subtypes in several tumor types, there is a great need to computationally transfer these labels to new datasets. Also, different studies used different approaches or criteria to define the cell subtypes for the same major cell lineages. The relationships between the cell subtypes defined in different studies should be carefully evaluated. In this updated package scCancer2, designed for integrative tumor scRNA-seq data analysis, we developed a supervised machine learning framework to annotate TME cells with annotated cell subtypes from 15 scRNA-seq datasets with 594 samples in total. Based on the trained classifiers, we quantitatively constructed the similarity maps between the cell subtypes defined in different references by testing on all the 15 datasets. Secondly, to improve the identification of malignant cells, we designed a classifier by integrating large-scale pan-cancer TCGA bulk gene expression datasets and scRNA-seq datasets (10 cancer types, 175 samples, 663 857 cells). This classifier shows robust performances when no internal confidential reference cells are available. Thirdly, scCancer2 integrated a module to process the spatial transcriptomic data and analyze the spatial features of TME. AVAILABILITY AND IMPLEMENTATION The package and user documentation are available at http://lifeome.net/software/sccancer2/ and https://doi.org/10.5281/zenodo.10477296.
Collapse
Affiliation(s)
- Zeyu Chen
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yuxin Miao
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University, Beijing 100084, China
| | - Zhiyuan Tan
- Department of Finance, Shanghai Advanced Institute of Finance, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qifan Hu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yanhong Wu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xinqi Li
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wenbo Guo
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University, Beijing 100084, China
| | - Jin Gu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
13
|
Chang JT, Liu LB, Wang PG, An J. Single-cell RNA sequencing to understand host-virus interactions. Virol Sin 2024; 39:1-8. [PMID: 38008383 PMCID: PMC10877424 DOI: 10.1016/j.virs.2023.11.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 11/23/2023] [Indexed: 11/28/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has allowed for the profiling of host and virus transcripts and host-virus interactions at single-cell resolution. This review summarizes the existing scRNA-seq technologies together with their strengths and weaknesses. The applications of scRNA-seq in various virological studies are discussed in depth, which broaden the understanding of the immune atlas, host-virus interactions, and immune repertoire. scRNA-seq can be widely used for virology in the near future to better understand the pathogenic mechanisms and discover more effective therapeutic strategies.
Collapse
Affiliation(s)
- Jia-Tong Chang
- Department of Microbiology, School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China
| | - Li-Bo Liu
- Department of Microbiology, School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China
| | - Pei-Gang Wang
- Department of Microbiology, School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China.
| | - Jing An
- Department of Microbiology, School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China.
| |
Collapse
|
14
|
Hu Y, Li CY, Lu Q, Kuang Y. Multiplex miRNA reporting platform for real-time profiling of living cells. Cell Chem Biol 2024; 31:150-162.e7. [PMID: 38035883 DOI: 10.1016/j.chembiol.2023.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 09/15/2023] [Accepted: 11/03/2023] [Indexed: 12/02/2023]
Abstract
Accurately characterizing cell types within complex cell structures provides invaluable information for comprehending the cellular status during biological processes. In this study, we have developed an miRNA-switch cocktail platform capable of reporting and tracking the activities of multiple miRNAs (microRNAs) at the single-cell level, while minimizing disruption to the cell culture. Drawing on the principles of traditional miRNA-sensing mRNA switches, our platform incorporates subcellular tags and employs intelligent engineering to segment three subcellular regions using two fluorescent proteins. These designs enable the quantification of multiple miRNAs within the same cell. Through our experiments, we have demonstrated the platform's ability to track marker miRNA levels during cell differentiation and provide spatial information of heterogeneity on outlier cells exhibiting extreme miRNA levels. Importantly, this platform offers real-time and in situ miRNA reporting, allowing for multidimensional evaluation of cell profile and paving the way for a comprehensive understanding of cellular events during biological processes.
Collapse
Affiliation(s)
- Yaxin Hu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
| | - Cheuk Yin Li
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
| | - Qiuyu Lu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
| | - Yi Kuang
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China.
| |
Collapse
|
15
|
Zhang Y, Sun H, Zhang W, Fu T, Huang S, Mou M, Zhang J, Gao J, Ge Y, Yang Q, Zhu F. CellSTAR: a comprehensive resource for single-cell transcriptomic annotation. Nucleic Acids Res 2024; 52:D859-D870. [PMID: 37855686 PMCID: PMC10767908 DOI: 10.1093/nar/gkad874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/12/2023] [Accepted: 09/27/2023] [Indexed: 10/20/2023] Open
Abstract
Large-scale studies of single-cell sequencing and biological experiments have successfully revealed expression patterns that distinguish different cell types in tissues, emphasizing the importance of studying cellular heterogeneity and accurately annotating cell types. Analysis of gene expression profiles in these experiments provides two essential types of data for cell type annotation: annotated references and canonical markers. In this study, the first comprehensive database of single-cell transcriptomic annotation resource (CellSTAR) was thus developed. It is unique in (a) offering the comprehensive expertly annotated reference data for annotating hundreds of cell types for the first time and (b) enabling the collective consideration of reference data and marker genes by incorporating tens of thousands of markers. Given its unique features, CellSTAR is expected to attract broad research interests from the technological innovations in single-cell transcriptomics, the studies of cellular heterogeneity & dynamics, and so on. It is now publicly accessible without any login requirement at: https://idrblab.org/cellstar.
Collapse
Affiliation(s)
- Ying Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wei Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Tingting Fu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Shijie Huang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jinsong Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yichao Ge
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
16
|
Du ZH, Hu WL, Li JQ, Shang X, You ZH, Chen ZZ, Huang YA. scPML: pathway-based multi-view learning for cell type annotation from single-cell RNA-seq data. Commun Biol 2023; 6:1268. [PMID: 38097699 PMCID: PMC10721875 DOI: 10.1038/s42003-023-05634-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Recent developments in single-cell technology have enabled the exploration of cellular heterogeneity at an unprecedented level, providing invaluable insights into various fields, including medicine and disease research. Cell type annotation is an essential step in its omics research. The mainstream approach is to utilize well-annotated single-cell data to supervised learning for cell type annotation of new singlecell data. However, existing methods lack good generalization and robustness in cell annotation tasks, partially due to difficulties in dealing with technical differences between datasets, as well as not considering the heterogeneous associations of genes in regulatory mechanism levels. Here, we propose the scPML model, which utilizes various gene signaling pathway data to partition the genetic features of cells, thus characterizing different interaction maps between cells. Extensive experiments demonstrate that scPML performs better in cell type annotation and detection of unknown cell types from different species, platforms, and tissues.
Collapse
Affiliation(s)
- Zhi-Hua Du
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Wei-Lin Hu
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhuang-Zhuang Chen
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
17
|
Molstad AJ, Motwani K. Multiresolution categorical regression for interpretable cell-type annotation. Biometrics 2023; 79:3485-3496. [PMID: 37798600 DOI: 10.1111/biom.13926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 08/07/2023] [Indexed: 10/07/2023]
Abstract
In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this paper, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. Our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell-type probabilities as a function of a cell's gene expression profile (i.e., cell-type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell-type annotation methodology.
Collapse
Affiliation(s)
- Aaron J Molstad
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Keshav Motwani
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| |
Collapse
|
18
|
Ghaddar B, De S. Hierarchical and automated cell-type annotation and inference of cancer cell of origin with Census. Bioinformatics 2023; 39:btad714. [PMID: 38011649 PMCID: PMC10713118 DOI: 10.1093/bioinformatics/btad714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 10/26/2023] [Accepted: 11/25/2023] [Indexed: 11/29/2023] Open
Abstract
MOTIVATION Cell-type annotation is a time-consuming yet critical first step in the analysis of single-cell RNA-seq data, especially when multiple similar cell subtypes with overlapping marker genes are present. Existing automated annotation methods have a number of limitations, including requiring large reference datasets, high computation time, shallow annotation resolution, and difficulty in identifying cancer cells or their most likely cell of origin. RESULTS We developed Census, a biologically intuitive and fully automated cell-type identification method for single-cell RNA-seq data that can deeply annotate normal cells in mammalian tissues and identify malignant cells and their likely cell of origin. Motivated by the inherently stratified developmental programs of cellular differentiation, Census infers hierarchical cell-type relationships and uses gradient-boosted \decision trees that capitalize on nodal cell-type relationships to achieve high prediction speed and accuracy. When benchmarked on 44 atlas-scale normal and cancer, human and mouse tissues, Census significantly outperforms state-of-the-art methods across multiple metrics and naturally predicts the cell-of-origin of different cancers. Census is pretrained on the Tabula Sapiens to classify 175 cell-types from 24 organs; however, users can seamlessly train their own models for customized applications. AVAILABILITY AND IMPLEMENTATION Census is available at Zenodo https://zenodo.org/records/7017103 and on our Github https://github.com/sjdlabgroup/Census.
Collapse
Affiliation(s)
- Bassel Ghaddar
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| | - Subhajyoti De
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| |
Collapse
|
19
|
Yin Q, Chen L. CellTICS: an explainable neural network for cell-type identification and interpretation based on single-cell RNA-seq data. Brief Bioinform 2023; 25:bbad449. [PMID: 38061196 PMCID: PMC10703497 DOI: 10.1093/bib/bbad449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/30/2023] [Accepted: 11/14/2023] [Indexed: 12/18/2023] Open
Abstract
Identifying cell types is crucial for understanding the functional units of an organism. Machine learning has shown promising performance in identifying cell types, but many existing methods lack biological significance due to poor interpretability. However, it is of the utmost importance to understand what makes cells share the same function and form a specific cell type, motivating us to propose a biologically interpretable method. CellTICS prioritizes marker genes with cell-type-specific expression, using a hierarchy of biological pathways for neural network construction, and applying a multi-predictive-layer strategy to predict cell and sub-cell types. CellTICS usually outperforms existing methods in prediction accuracy. Moreover, CellTICS can reveal pathways that define a cell type or a cell type under specific physiological conditions, such as disease or aging. The nonlinear nature of neural networks enables us to identify many novel pathways. Interestingly, some of the pathways identified by CellTICS exhibit differential expression "variability" rather than differential expression across cell types, indicating that expression stochasticity within a pathway could be an important feature characteristic of a cell type. Overall, CellTICS provides a biologically interpretable method for identifying and characterizing cell types, shedding light on the underlying pathways that define cellular heterogeneity and its role in organismal function. CellTICS is available at https://github.com/qyyin0516/CellTICS.
Collapse
Affiliation(s)
- Qingyang Yin
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| | - Liang Chen
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| |
Collapse
|
20
|
HELLER GERWIN, FUEREDER THORSTEN, GRANDITS ALEXANDERMICHAEL, WIESER ROTRAUD. New perspectives on biology, disease progression, and therapy response of head and neck cancer gained from single cell RNA sequencing and spatial transcriptomics. Oncol Res 2023; 32:1-17. [PMID: 38188682 PMCID: PMC10767240 DOI: 10.32604/or.2023.044774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 10/12/2023] [Indexed: 01/09/2024] Open
Abstract
Head and neck squamous cell carcinoma (HNSCC) is one of the most frequent cancers worldwide. The main risk factors are consumption of tobacco products and alcohol, as well as infection with human papilloma virus. Approved therapeutic options comprise surgery, radiation, chemotherapy, targeted therapy through epidermal growth factor receptor inhibition, and immunotherapy, but outcome has remained unsatisfactory due to recurrence rates of ~50% and the frequent occurrence of second primaries. The availability of the human genome sequence at the beginning of the millennium heralded the omics era, in which rapid technological progress has advanced our knowledge of the molecular biology of malignant diseases, including HNSCC, at an unprecedented pace. Initially, microarray-based methods, followed by approaches based on next-generation sequencing, were applied to study the genetics, epigenetics, and gene expression patterns of bulk tumors. More recently, the advent of single-cell RNA sequencing (scRNAseq) and spatial transcriptomics methods has facilitated the investigation of the heterogeneity between and within different cell populations in the tumor microenvironment (e.g., cancer cells, fibroblasts, immune cells, endothelial cells), led to the discovery of novel cell types, and advanced the discovery of cell-cell communication within tumors. This review provides an overview of scRNAseq, spatial transcriptomics, and the associated bioinformatics methods, and summarizes how their application has promoted our understanding of the emergence, composition, progression, and therapy responsiveness of, and intercellular signaling within, HNSCC.
Collapse
Affiliation(s)
- GERWIN HELLER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | - THORSTEN FUEREDER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | | | - ROTRAUD WIESER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
- Ludwig Boltzmann Institute for Hematology and Oncology, Medical University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
21
|
Subudhi AK, Green JL, Satyam R, Salunke RP, Lenz T, Shuaib M, Isaioglou I, Abel S, Gupta M, Esau L, Mourier T, Nugmanova R, Mfarrej S, Shivapurkar R, Stead Z, Rached FB, Ostwal Y, Sougrat R, Dada A, Kadamany AF, Fischle W, Merzaban J, Knuepfer E, Ferguson DJP, Gupta I, Le Roch KG, Holder AA, Pain A. DNA-binding protein PfAP2-P regulates parasite pathogenesis during malaria parasite blood stages. Nat Microbiol 2023; 8:2154-2169. [PMID: 37884813 PMCID: PMC10627835 DOI: 10.1038/s41564-023-01497-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 09/11/2023] [Indexed: 10/28/2023]
Abstract
Malaria-associated pathogenesis such as parasite invasion, egress, host cell remodelling and antigenic variation requires concerted action by many proteins, but the molecular regulation is poorly understood. Here we have characterized an essential Plasmodium-specific Apicomplexan AP2 transcription factor in Plasmodium falciparum (PfAP2-P; pathogenesis) during the blood-stage development with two peaks of expression. An inducible knockout of gene function showed that PfAP2-P is essential for trophozoite development, and critical for var gene regulation, merozoite development and parasite egress. Chromatin immunoprecipitation sequencing data collected at timepoints matching the two peaks of pfap2-p expression demonstrate PfAP2-P binding to promoters of genes controlling trophozoite development, host cell remodelling, antigenic variation and pathogenicity. Single-cell RNA sequencing and fluorescence-activated cell sorting revealed de-repression of most var genes in Δpfap2-p parasites. Δpfap2-p parasites also overexpress early gametocyte marker genes, indicating a regulatory role in sexual stage conversion. We conclude that PfAP2-P is an essential upstream transcriptional regulator at two distinct stages of the intra-erythrocytic development cycle.
Collapse
Affiliation(s)
- Amit Kumar Subudhi
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Judith L Green
- Malaria Parasitology Laboratory, The Francis Crick Institute, London, UK
| | - Rohit Satyam
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| | - Rahul P Salunke
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Todd Lenz
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, CA, USA
| | - Muhammad Shuaib
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Ioannis Isaioglou
- Cell Migration and Signaling Laboratory, Bioscience Program, BESE Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Steven Abel
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, CA, USA
| | - Mohit Gupta
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, CA, USA
| | - Luke Esau
- KAUST Core Labs, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Tobias Mourier
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Raushan Nugmanova
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Sara Mfarrej
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Rupali Shivapurkar
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Zenaida Stead
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Fathia Ben Rached
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Yogesh Ostwal
- Laboratory of Chromatin Biochemistry, Bioscience Program, BESE Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Rachid Sougrat
- KAUST Core Labs, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Ashraf Dada
- Department of Pathology and Laboratory Medicine, King Faisal Specialist Hospital and Research Center, Jeddah, Kingdom of Saudi Arabia
- College of Medicine, Al Faisal University, Riyadh, Saudi Arabia
| | - Abdullah Fuaad Kadamany
- Department of Pathology and Laboratory Medicine, King Faisal Specialist Hospital and Research Center, Jeddah, Kingdom of Saudi Arabia
| | - Wolfgang Fischle
- Laboratory of Chromatin Biochemistry, Bioscience Program, BESE Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Jasmeen Merzaban
- Cell Migration and Signaling Laboratory, Bioscience Program, BESE Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Ellen Knuepfer
- Malaria Parasitology Laboratory, The Francis Crick Institute, London, UK
- Molecular and Cellular Parasitology Laboratory, Department of Pathobiology and Population Sciences, The Royal Veterinary College, Hatfield, UK
| | - David J P Ferguson
- Nuffield Department of Clinical Laboratory Science, University of Oxford, John Radcliffe Hospital, Oxford, UK
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, UK
| | - Ishaan Gupta
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi, India
- School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, India
| | - Karine G Le Roch
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, CA, USA
| | - Anthony A Holder
- Malaria Parasitology Laboratory, The Francis Crick Institute, London, UK.
| | - Arnab Pain
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia.
- International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan.
| |
Collapse
|
22
|
Lazaros K, Vlamos P, Vrahatis AG. Methods for cell-type annotation on scRNA-seq data: A recent overview. J Bioinform Comput Biol 2023; 21:2340002. [PMID: 37743364 DOI: 10.1142/s0219720023400024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The evolution of single-cell technology is ongoing, continually generating massive amounts of data that reveal many mysteries surrounding intricate diseases. However, their drawbacks continue to constrain us. Among these, annotating cell types in single-cell gene expressions pose a substantial challenge, despite the myriad of tools at our disposal. The rapid growth in data, resources, and tools has consequently brought about significant alterations in this area over the years. In our study, we spotlight all note-worthy cell type annotation techniques developed over the past four years. We provide an overview of the latest trends in this field, showcasing the most advanced methods in taxonomy. Our research underscores the demand for additional tools that incorporate a biological context and also predicts that the rising trend of graph neural network approaches will likely lead this research field in the coming years.
Collapse
Affiliation(s)
- Konstantinos Lazaros
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| | - Panagiotis Vlamos
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| | - Aristidis G Vrahatis
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| |
Collapse
|
23
|
Ren P, Shi X, Yu Z, Dong X, Ding X, Wang J, Sun L, Yan Y, Hu J, Zhang P, Chen Q, Zhang J, Li T, Wang C. Single-cell assignment using multiple-adversarial domain adaptation network with large-scale references. CELL REPORTS METHODS 2023; 3:100577. [PMID: 37751689 PMCID: PMC10545911 DOI: 10.1016/j.crmeth.2023.100577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 06/11/2023] [Accepted: 08/09/2023] [Indexed: 09/28/2023]
Abstract
The rapid accumulation of single-cell RNA-seq data has provided rich resources to characterize various human cell populations. However, achieving accurate cell-type annotation using public references presents challenges due to inconsistent annotations, batch effects, and rare cell types. Here, we introduce SELINA (single-cell identity navigator), an integrative and automatic cell-type annotation framework based on a pre-curated reference atlas spanning various tissues. SELINA employs a multiple-adversarial domain adaptation network to remove batch effects within the reference dataset. Additionally, it enhances the annotation of less frequent cell types by synthetic minority oversampling and fits query data with the reference data using an autoencoder. SELINA culminates in the creation of a comprehensive and uniform reference atlas, encompassing 1.7 million cells covering 230 distinct human cell types. We substantiate its robustness and superiority across a multitude of human tissues. Notably, SELINA could accurately annotate cells within diverse disease contexts. SELINA provides a complete solution for human single-cell RNA-seq data annotation with both python and R packages.
Collapse
Affiliation(s)
- Pengfei Ren
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100084, China
| | - Xiaoying Shi
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Zhiguang Yu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Guangxi 530004, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xuanxin Ding
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Jin Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Liangdong Sun
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Yilv Yan
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Junjie Hu
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Peng Zhang
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
| | - Qianming Chen
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing 211166, China
| | - Jing Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Science and Technology, Tongji University, Shanghai, China.
| | - Taiwen Li
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu 610041, China; Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Medicine, Nanjing Medical University, Nanjing 211166, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration of Ministry of Education, Department of Orthopedics, Tongji Hospital, School of Life Science and Technology, Tongji University, Shanghai 200092, China; Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.
| |
Collapse
|
24
|
Fiannaca A, La Rosa M, La Paglia L, Gaglio S, Urso A. GOWDL: gene ontology-driven wide and deep learning model for cell typing of scRNA-seq data. Brief Bioinform 2023; 24:bbad332. [PMID: 37756593 PMCID: PMC10530315 DOI: 10.1093/bib/bbad332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/17/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) allows for obtaining genomic and transcriptomic profiles of individual cells. That data make it possible to characterize tissues at the cell level. In this context, one of the main analyses exploiting scRNA-seq data is identifying the cell types within tissue to estimate the quantitative composition of cell populations. Due to the massive amount of available scRNA-seq data, automatic classification approaches for cell typing, based on the most recent deep learning technology, are needed. Here, we present the gene ontology-driven wide and deep learning (GOWDL) model for classifying cell types in several tissues. GOWDL implements a hybrid architecture that considers the functional annotations found in Gene Ontology and the marker genes typical of specific cell types. We performed cross-validation and independent external testing, comparing our algorithm with 12 other state-of-the-art predictors. Classification scores demonstrated that GOWDL reached the best results over five different tissues, except for recall, where we got about 92% versus 97% of the best tool. Finally, we presented a case study on classifying immune cell populations in breast cancer using a hierarchical approach based on GOWDL.
Collapse
Affiliation(s)
- Antonino Fiannaca
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Massimo La Rosa
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Laura La Paglia
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Salvatore Gaglio
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
- Dipartimento di Ingegneria, Università degli studi di Palermo, Viale Delle Scienze, ed. 6, 90128, Palermo, Italy
| | - Alfonso Urso
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| |
Collapse
|
25
|
Lyu P, Zhai Y, Li T, Qian J. CellAnn: a comprehensive, super-fast, and user-friendly single-cell annotation web server. Bioinformatics 2023; 39:btad521. [PMID: 37610325 PMCID: PMC10477937 DOI: 10.1093/bioinformatics/btad521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 07/17/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023] Open
Abstract
MOTIVATION Single-cell sequencing technology has become a routine in studying many biological problems. A core step of analyzing single-cell data is the assignment of cell clusters to specific cell types. Reference-based methods are proposed for predicting cell types for single-cell clusters. However, the scalability and lack of preprocessed reference datasets prevent them from being practical and easy to use. RESULTS Here, we introduce a reference-based cell annotation web server, CellAnn, which is super-fast and easy to use. CellAnn contains a comprehensive reference database with 204 human and 191 mouse single-cell datasets. These reference datasets cover 32 organs. Furthermore, we developed a cluster-to-cluster alignment method to transfer cell labels from the reference to the query datasets, which is superior to the existing methods with higher accuracy and higher scalability. Finally, CellAnn is an online tool that integrates all the procedures in cell annotation, including reference searching, transferring cell labels, visualizing results, and harmonizing cell annotation labels. Through the user-friendly interface, users can identify the best annotation by cross-validating with multiple reference datasets. We believe that CellAnn can greatly facilitate single-cell sequencing data analysis. AVAILABILITY AND IMPLEMENTATION The web server is available at www.cellann.io, and the source code is available at https://github.com/Pinlyu3/CellAnn_shinyapp.
Collapse
Affiliation(s)
- Pin Lyu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| | - Yijie Zhai
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| | - Taibo Li
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21218, United States
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| |
Collapse
|
26
|
Aran D. Single-Cell RNA Sequencing for Studying Human Cancers. Annu Rev Biomed Data Sci 2023; 6:1-22. [PMID: 37040737 DOI: 10.1146/annurev-biodatasci-020722-091857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Since the first publication a decade ago describing the use of single-cell RNA sequencing (scRNA-seq) in the context of cancer, over 200 datasets and thousands of scRNA-seq studies have been published in cancer biology. scRNA-seq technologies have been applied across dozens of cancer types and a diverse array of study designs to improve our understanding of tumor biology, the tumor microenvironment, and therapeutic responses, and scRNA-seq is on the verge of being used to improve decision-making in the clinic. Computational methodologies and analytical pipelines are key in facilitating scRNA-seq research. Numerous computational methods utilizing the most advanced tools in data science have been developed to extract meaningful insights. Here, we review the advancements in cancer biology gained by scRNA-seq and discuss the computational challenges of the technology that are specific to cancer research.
Collapse
Affiliation(s)
- Dvir Aran
- Faculty of Biology, The Taub Faculty of Computer Science, and Lorry I. Lokey Interdisciplinary Center for Life Sciences and Engineering, Technion-Israel Institute of Technology, Haifa, Israel;
| |
Collapse
|
27
|
Cheng C, Chen W, Jin H, Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication. Cells 2023; 12:1970. [PMID: 37566049 PMCID: PMC10417635 DOI: 10.3390/cells12151970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular biology at an unprecedented resolution, enabling the characterization of cellular heterogeneity, identification of rare but significant cell types, and exploration of cell-cell communications and interactions. Its broad applications span both basic and clinical research domains. In this comprehensive review, we survey the current landscape of scRNA-seq analysis methods and tools, focusing on count modeling, cell-type annotation, data integration, including spatial transcriptomics, and the inference of cell-cell communication. We review the challenges encountered in scRNA-seq analysis, including issues of sparsity or low expression, reliability of cell annotation, and assumptions in data integration, and discuss the potential impact of suboptimal clustering and differential expression analysis tools on downstream analyses, particularly in identifying cell subpopulations. Finally, we discuss recent advancements and future directions for enhancing scRNA-seq analysis. Specifically, we highlight the development of novel tools for annotating single-cell data, integrating and interpreting multimodal datasets covering transcriptomics, epigenomics, and proteomics, and inferring cellular communication networks. By elucidating the latest progress and innovation, we provide a comprehensive overview of the rapidly advancing field of scRNA-seq analysis.
Collapse
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| | - Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Hongjian Jin
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Xiang Chen
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| |
Collapse
|
28
|
Xiong YX, Wang MG, Chen L, Zhang XF. Cell-type annotation with accurate unseen cell-type identification using multiple references. PLoS Comput Biol 2023; 19:e1011261. [PMID: 37379341 PMCID: PMC10335708 DOI: 10.1371/journal.pcbi.1011261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/11/2023] [Accepted: 06/11/2023] [Indexed: 06/30/2023] Open
Abstract
The recent advances in single-cell RNA sequencing (scRNA-seq) techniques have stimulated efforts to identify and characterize the cellular composition of complex tissues. With the advent of various sequencing techniques, automated cell-type annotation using a well-annotated scRNA-seq reference becomes popular. But it relies on the diversity of cell types in the reference, which may not capture all the cell types present in the query data of interest. There are generally unseen cell types in the query data of interest because most data atlases are obtained for different purposes and techniques. Identifying previously unseen cell types is essential for improving annotation accuracy and uncovering novel biological discoveries. To address this challenge, we propose mtANN (multiple-reference-based scRNA-seq data annotation), a new method to automatically annotate query data while accurately identifying unseen cell types with the aid of multiple references. Key innovations of mtANN include the integration of deep learning and ensemble learning to improve prediction accuracy, and the introduction of a new metric that considers three complementary aspects to distinguish between unseen cell types and shared cell types. Additionally, we provide a data-driven method to adaptively select a threshold for identifying previously unseen cell types. We demonstrate the advantages of mtANN over state-of-the-art methods for unseen cell-type identification and cell-type annotation on two benchmark dataset collections, as well as its predictive power on a collection of COVID-19 datasets. The source code and tutorial are available at https://github.com/Zhangxf-ccnu/mtANN.
Collapse
Affiliation(s)
- Yi-Xuan Xiong
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| | - Meng-Guo Wang
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| | - Luonan Chen
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| |
Collapse
|
29
|
Davalos OA, Heydari AA, Fertig EJ, Sindi SS, Hoyer KK. Boosting Single-Cell RNA Sequencing Analysis with Simple Neural Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.29.542760. [PMID: 37398136 PMCID: PMC10312486 DOI: 10.1101/2023.05.29.542760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
A limitation of current deep learning (DL) approaches for single-cell RNA sequencing (scRNAseq) analysis is the lack of interpretability. Moreover, existing pipelines are designed and trained for specific tasks used disjointly for different stages of analysis. We present scANNA, a novel interpretable DL model for scRNAseq studies that leverages neural attention to learn gene associations. After training, the learned gene importance (interpretability) is used to perform downstream analyses (e.g., global marker selection and cell-type classification) without retraining. ScANNA's performance is comparable to or better than state-of-the-art methods designed and trained for specific standard scRNAseq analyses even though scANNA was not trained for these tasks explicitly. ScANNA enables researchers to discover meaningful results without extensive prior knowledge or training separate task-specific models, saving time and enhancing scRNAseq analyses.
Collapse
Affiliation(s)
- Oscar A. Davalos
- Quantitative and Systems Biology Graduate Program, University of California, Merced, CA, USA
| | - A. Ali Heydari
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Elana J. Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Katrina K. Hoyer
- Health Sciences Research Institute, University of California, Merced, CA, USA
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| |
Collapse
|
30
|
Subudhi AK, Green JL, Satyam R, Lenz T, Salunke RP, Shuaib M, Isaioglou I, Abel S, Gupta M, Esau L, Mourier T, Nugmanova R, Mfarrej S, Sivapurkar R, Stead Z, Rached FB, Otswal Y, Sougrat R, Dada A, Kadamany AF, Fischle W, Merzaban J, Knuepfer E, Ferguson DJP, Gupta I, Le Roch KG, Holder AA, Pain A. PfAP2-MRP DNA-binding protein is a master regulator of parasite pathogenesis during malaria parasite blood stages. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.23.541898. [PMID: 37293082 PMCID: PMC10245809 DOI: 10.1101/2023.05.23.541898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Malaria pathogenicity results from the parasite's ability to invade, multiply within and then egress from the host red blood cell (RBC). Infected RBCs are remodeled, expressing antigenic variant proteins (such as PfEMP1, coded by the var gene family) for immune evasion and survival. These processes require the concerted actions of many proteins, but the molecular regulation is poorly understood. We have characterized an essential Plasmodium specific Apicomplexan AP2 (ApiAP2) transcription factor in Plasmodium falciparum (PfAP2-MRP; Master Regulator of Pathogenesis) during the intraerythrocytic developmental cycle (IDC). An inducible gene knockout approach showed that PfAP2-MRP is essential for development during the trophozoite stage, and critical for var gene regulation, merozoite development and parasite egress. ChIP-seq experiments performed at 16 hour post invasion (h.p.i.) and 40 h.p.i. matching the two peaks of PfAP2-MRP expression, demonstrate binding of PfAP2-MRP to the promoters of genes controlling trophozoite development and host cell remodeling at 16 h.p.i. and antigenic variation and pathogenicity at 40 h.p.i. Using single-cell RNA-seq and fluorescence-activated cell sorting, we show de-repression of most var genes in Δpfap2-mrp parasites that express multiple PfEMP1 proteins on the surface of infected RBCs. In addition, the Δpfap2-mrp parasites overexpress several early gametocyte marker genes at both 16 and 40 h.p.i., indicating a regulatory role in the sexual stage conversion. Using the Chromosomes Conformation Capture experiment (Hi-C), we demonstrate that deletion of PfAP2-MRP results in significant reduction of both intra-chromosomal and inter-chromosomal interactions in heterochromatin clusters. We conclude that PfAP2-MRP is a vital upstream transcriptional regulator controlling essential processes in two distinct developmental stages during the IDC that include parasite growth, chromatin structure and var gene expression.
Collapse
Affiliation(s)
- Amit Kumar Subudhi
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Judith L Green
- Malaria Parasitology Laboratory, The Francis Crick Institute, London, NW1 1AT, United Kingdom
| | - Rohit Satyam
- Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, Okhla, New Delhi, Delhi 110025, India
| | - Todd Lenz
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, California, United States of America
| | - Rahul P Salunke
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Muhammad Shuaib
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Ioannis Isaioglou
- Cell Migration and Signaling Laboratory, Bioscience Program, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Steven Abel
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, California, United States of America
| | - Mohit Gupta
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, California, United States of America
| | - Luke Esau
- KAUST Core Labs, KAUST, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tobias Mourier
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Raushan Nugmanova
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Sara Mfarrej
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Rupali Sivapurkar
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Zenaida Stead
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Fathia Ben Rached
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Yogesh Otswal
- Laboratory of Chromatin Biochemistry, Bioscience Program, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Rachid Sougrat
- KAUST Core Labs, KAUST, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Ashraf Dada
- Department of Pathology and Laboratory Medicine, King Faisal Specialist Hospital and Research Center, Jeddah, Kingdom of Saudi Arabia
| | - Abdullah Fuaad Kadamany
- Department of Pathology and Laboratory Medicine, King Faisal Specialist Hospital and Research Center, Jeddah, Kingdom of Saudi Arabia
| | - Wolfgang Fischle
- Laboratory of Chromatin Biochemistry, Bioscience Program, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Jasmeen Merzaban
- Cell Migration and Signaling Laboratory, Bioscience Program, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Ellen Knuepfer
- Malaria Parasitology Laboratory, The Francis Crick Institute, London, NW1 1AT, United Kingdom
| | - David J P Ferguson
- Nuffield Department of Clinical Laboratory Science, University of Oxford, John Radcliffe Hospital, Oxford OX1 2JD, United Kingdom
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Ishaan Gupta
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi, India
| | - Karine G Le Roch
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, California, United States of America
| | - Anthony A Holder
- Malaria Parasitology Laboratory, The Francis Crick Institute, London, NW1 1AT, United Kingdom
| | - Arnab Pain
- Pathogen Genomics Group, Bioscience Program, Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
- International Institute for Zoonosis Control; Hokkaido University, Sapporo, Japan
| |
Collapse
|
31
|
Liu H, Li H, Sharma A, Huang W, Pan D, Gu Y, Lin L, Sun X, Liu H. scAnno: a deconvolution strategy-based automatic cell type annotation tool for single-cell RNA-sequencing data sets. Brief Bioinform 2023; 24:bbad179. [PMID: 37183449 DOI: 10.1093/bib/bbad179] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/29/2023] [Accepted: 04/19/2023] [Indexed: 05/16/2023] Open
Abstract
Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at https://github.com/liuhong-jia/scAnno.
Collapse
Affiliation(s)
- Hongjia Liu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Huamei Li
- Department of General Surgery, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, 210008, PR China
| | - Amit Sharma
- Department of Neurosurgery, University Hospital Bonn, Bonn, Germany
| | | | - Duo Pan
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Yu Gu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Lu Lin
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Xiao Sun
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Hongde Liu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| |
Collapse
|
32
|
Liu Y, Wei G, Li C, Shen LC, Gasser RB, Song J, Chen D, Yu DJ. TripletCell: a deep metric learning framework for accurate annotation of cell types at the single-cell level. Brief Bioinform 2023; 24:bbad132. [PMID: 37080771 PMCID: PMC10199768 DOI: 10.1093/bib/bbad132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/02/2023] [Accepted: 03/14/2023] [Indexed: 04/22/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has significantly accelerated the experimental characterization of distinct cell lineages and types in complex tissues and organisms. Cell-type annotation is of great importance in most of the scRNA-seq analysis pipelines. However, manual cell-type annotation heavily relies on the quality of scRNA-seq data and marker genes, and therefore can be laborious and time-consuming. Furthermore, the heterogeneity of scRNA-seq datasets poses another challenge for accurate cell-type annotation, such as the batch effect induced by different scRNA-seq protocols and samples. To overcome these limitations, here we propose a novel pipeline, termed TripletCell, for cross-species, cross-protocol and cross-sample cell-type annotation. We developed a cell embedding and dimension-reduction module for the feature extraction (FE) in TripletCell, namely TripletCell-FE, to leverage the deep metric learning-based algorithm for the relationships between the reference gene expression matrix and the query cells. Our experimental studies on 21 datasets (covering nine scRNA-seq protocols, two species and three tissues) demonstrate that TripletCell outperformed state-of-the-art approaches for cell-type annotation. More importantly, regardless of protocols or species, TripletCell can deliver outstanding and robust performance in annotating different types of cells. TripletCell is freely available at https://github.com/liuyan3056/TripletCell. We believe that TripletCell is a reliable computational tool for accurately annotating various cell types using scRNA-seq data and will be instrumental in assisting the generation of novel biological hypotheses in cell biology.
Collapse
Affiliation(s)
- Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Guo Wei
- School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Dijun Chen
- School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
33
|
Kim H, Kim HK, Hong D, Kim M, Jang S, Yang CS, Yoon S. Identification of ulcerative colitis-specific immune cell signatures from public single-cell RNA-seq data. Genes Genomics 2023:10.1007/s13258-023-01390-w. [PMID: 37133723 DOI: 10.1007/s13258-023-01390-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 04/13/2023] [Indexed: 05/04/2023]
Abstract
BACKGROUND Single-cell RNA-seq enabled microscopic studies on tissue microenvironment of many diseases. Inflammatory bowel disease, an autoimmune disease, is involved with various dysfunction of immune cells, for which single-cell RNA-seq may provide us a deeper insight into the causes and mechanism of this complex disease. OBJECTIVE In this work, we used public single-cell RNA-seq data to study tissue microenvironment around ulcerative colitis, an inflammatory bowel disease causing chronic inflammation and ulcers in large intestine. METHODS Since not all the datasets provide cell-type annotations, we first identified cell identities to select cell populations of our interest. Differentially expressed genes and gene set enrichment analysis was then performed to infer the polarization/activation state of macrophages and T cells. Cell-to-cell interaction analysis was also performed to discover distinct interactions in ulcerative colitis. RESULTS Differentially expressed genes analysis of the two datasets confirmed the regulation of CTLA4, IL2RA, and CCL5 genes in the T cell subset and regulation of S100A8/A9, CLEC10A genes in macrophages. Cell-to-cell interaction analysis showed CD4+ T cells and macrophages interact actively to each other. We also identified IL-18 pathway activation in inflammatory macrophages, evidence that CD4+ T cells induce Th1 and Th2 differentiation, and also found that macrophages regulate T cell activation through different ligand-receptor pairs, viz. CD86-CTL4, LGALS9-CD47, SIRPA-CD47, and GRN-TNFRSF1B. CONCLUSION Analysis of these immune cell subsets may suggest novel strategies for the treatment of inflammatory bowel disease.
Collapse
Affiliation(s)
- Hanbyeol Kim
- Dept of Computer Science, College of SW Convergence, Dankook Univ, Yongin-si, 16890, Korea
| | - Hyo Keun Kim
- Dept of Molecular and Life Science and Center for Bionano Intelligence Education and Research, Hanyang University, Ansan-si, 15588, Korea
| | - Dawon Hong
- Dept of Molecular Biology, Graduate Department of Bioconvergence Engineering, Dankook University, Yongin-si, 16890, Korea
| | - Minsu Kim
- Dept of Computer Science, College of SW Convergence, Dankook Univ, Yongin-si, 16890, Korea
| | - Sein Jang
- Dept of Molecular and Life Science and Center for Bionano Intelligence Education and Research, Hanyang University, Ansan-si, 15588, Korea
| | - Chul-Su Yang
- Dept of Medicinal/Molecular and Life Science and Center for Bionano Intelligence Education and Research, Hanyang University, Ansan-si, 15588, Korea
| | - Seokhyun Yoon
- Dept of Electronics & Electrical Eng, College of Engineering, Dankook Univ, Yongin-si, 16890, Korea.
| |
Collapse
|
34
|
Ma W, Lu J, Wu H. Cellcano: supervised cell type identification for single cell ATAC-seq data. Nat Commun 2023; 14:1864. [PMID: 37012226 PMCID: PMC10070275 DOI: 10.1038/s41467-023-37439-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 03/15/2023] [Indexed: 04/05/2023] Open
Abstract
Computational cell type identification is a fundamental step in single-cell omics data analysis. Supervised celltyping methods have gained increasing popularity in single-cell RNA-seq data because of the superior performance and the availability of high-quality reference datasets. Recent technological advances in profiling chromatin accessibility at single-cell resolution (scATAC-seq) have brought new insights to the understanding of epigenetic heterogeneity. With continuous accumulation of scATAC-seq datasets, supervised celltyping method specifically designed for scATAC-seq is in urgent need. Here we develop Cellcano, a computational method based on a two-round supervised learning algorithm to identify cell types from scATAC-seq data. The method alleviates the distributional shift between reference and target data and improves the prediction performance. After systematically benchmarking Cellcano on 50 well-designed celltyping tasks from various datasets, we show that Cellcano is accurate, robust, and computationally efficient. Cellcano is well-documented and freely available at https://marvinquiet.github.io/Cellcano/ .
Collapse
Affiliation(s)
- Wenjing Ma
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA, 30322, USA
| | - Jiaying Lu
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA, 30322, USA
| | - Hao Wu
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055, P. R. China.
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, GA, 30322, USA.
| |
Collapse
|
35
|
Jiao L, Wang G, Dai H, Li X, Wang S, Song T. scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings. Biomolecules 2023; 13:biom13040611. [PMID: 37189359 DOI: 10.3390/biom13040611] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/05/2023] [Accepted: 03/10/2023] [Indexed: 03/31/2023] Open
Abstract
Single-cell transcriptomics is rapidly advancing our understanding of the composition of complex tissues and biological cells, and single-cell RNA sequencing (scRNA-seq) holds great potential for identifying and characterizing the cell composition of complex tissues. Cell type identification by analyzing scRNA-seq data is mostly limited by time-consuming and irreproducible manual annotation. As scRNA-seq technology scales to thousands of cells per experiment, the exponential increase in the number of cell samples makes manual annotation more difficult. On the other hand, the sparsity of gene transcriptome data remains a major challenge. This paper applied the idea of the transformer to single-cell classification tasks based on scRNA-seq data. We propose scTransSort, a cell-type annotation method pretrained with single-cell transcriptomics data. The scTransSort incorporates a method of representing genes as gene expression embedding blocks to reduce the sparsity of data used for cell type identification and reduce the computational complexity. The feature of scTransSort is that its implementation of intelligent information extraction for unordered data, automatically extracting valid features of cell types without the need for manually labeled features and additional references. In experiments on cells from 35 human and 26 mouse tissues, scTransSort successfully elucidated its high accuracy and high performance for cell type identification, and demonstrated its own high robustness and generalization ability.
Collapse
|
36
|
Nofech-Mozes I, Soave D, Awadalla P, Abelson S. Pan-cancer classification of single cells in the tumour microenvironment. Nat Commun 2023; 14:1615. [PMID: 36959212 PMCID: PMC10036554 DOI: 10.1038/s41467-023-37353-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 03/10/2023] [Indexed: 03/25/2023] Open
Abstract
Single-cell RNA sequencing can reveal valuable insights into cellular heterogeneity within tumour microenvironments (TMEs), paving the way for a deep understanding of cellular mechanisms contributing to cancer. However, high heterogeneity among the same cancer types and low transcriptomic variation in immune cell subsets present challenges for accurate, high-resolution confirmation of cells' identities. Here we present scATOMIC; a modular annotation tool for malignant and non-malignant cells. We trained scATOMIC on >300,000 cancer, immune, and stromal cells defining a pan-cancer reference across 19 common cancers and employ a hierarchical approach, outperforming current classification methods. We extensively confirm scATOMIC's accuracy on 225 tumour biopsies encompassing >350,000 cancer and a variety of TME cells. Lastly, we demonstrate scATOMIC's practical significance to accurately subset breast cancers into clinically relevant subtypes and predict tumours' primary origin across metastatic cancers. Our approach represents a broadly applicable strategy to analyse multicellular cancer TMEs.
Collapse
Affiliation(s)
- Ido Nofech-Mozes
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - David Soave
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Mathematics, Wilfrid Laurier University, Waterloo, ON, Canada
| | - Philip Awadalla
- Ontario Institute for Cancer Research, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
| | - Sagi Abelson
- Ontario Institute for Cancer Research, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
37
|
Lee J, Kim M, Kang K, Yang CS, Yoon S. Hierarchical cell-type identifier accurately distinguishes immune-cell subtypes enabling precise profiling of tissue microenvironment with single-cell RNA-sequencing. Brief Bioinform 2023; 24:6995373. [PMID: 36681937 PMCID: PMC10025442 DOI: 10.1093/bib/bbad006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 12/22/2022] [Accepted: 01/02/2023] [Indexed: 01/23/2023] Open
Abstract
Single-cell RNA-seq enabled in-depth study on tissue micro-environment and immune-profiling, where a crucial step is to annotate cell identity. Immune cells play key roles in many diseases, whereas their activities are hard to track due to their diverse and highly variable nature. Existing cell-type identifiers had limited performance for this purpose. We present HiCAT, a hierarchical, marker-based cell-type identifier utilising gene set analysis for statistical scoring for given markers. It features successive identification of major-type, minor-type and subsets utilising subset markers structured in a three-level taxonomy tree. Comparison with manual annotation and pairwise match test showed HiCAT outperforms others in major- and minor-type identification. For subsets, we qualitatively evaluated the marker expression profile demonstrating that HiCAT provide the clearest immune-cell landscape. HiCAT was also used for immune-cell profiling in ulcerative colitis and discovered distinct features of the disease in macrophage and T-cell subsets that could not be identified previously.
Collapse
Affiliation(s)
- Joongho Lee
- Dept. of Computer Science, College of SW Convergence, Dankook University, Yongin-si, Korea, 16890
| | - Minsoo Kim
- Dept. of Computer Science, College of SW Convergence, Dankook University, Yongin-si, Korea, 16890
| | - Keunsoo Kang
- Dept. of Microbiology, College of Natural Sciences, Dankook University, Cheonan-si, Korea, 31116
| | - Chul-Su Yang
- Dept. of Molecular and Life Science, Center for Bionano Intelligence Education and Research, Hanyang University, Ansan, Korea, 15588
| | - Seokhyun Yoon
- Dept. of Electronics & Electrical Eng., College of Engineering, Dankook University, Yongin-si Korea, 16890
| |
Collapse
|
38
|
Lee J, Kim H, Kim M, Yoon S, Lee S. Role of lymphoid lineage cells aberrantly expressing alarmins S100A8/A9 in determining the severity of COVID-19. Genes Genomics 2023; 45:337-346. [PMID: 36107397 PMCID: PMC9476394 DOI: 10.1007/s13258-022-01285-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/08/2022] [Indexed: 01/09/2023]
Abstract
BACKGROUND Alarmins S100A8 and S100A9 are recognized as hallmarks of severe COVID-19 and are primarily produced in myeloid cells, such as monocytes and neutrophils. As single-cell RNA-sequencing (scRNA-seq) data from patients with COVID-19 revealed the expression of S100A8/A9 in lymphoid cells in patients with severe COVID-19. OBJECTIVE We investigated the characteristics of lymphoid cells expressing S100A8/A9 in COVID-19 patients. METHODS Publicly available scRNA-seq data from patients with mild (N = 12) or severe (N = 7) COVID-19 were reanalyzed. The data were further divided into the following two groups based on the time of sample collection (from infection-onset): within 6 days (early phase) and after 6 days (late phase). Differential expression and gene set enrichment analyses were performed between S100A8/A9High and S100A8/A9Low lymphoid cells. Finally, cell-cell interaction analysis was performed to investigate the role of lymphoid cells expressing high levels of S100A8/A9 in COVID-19. RESULTS S100A8/A9 overexpression was observed in lymphoid cells, including B cells, T cells, and NK cells, in patients with severe COVID-19 (compared to patients with mild COVID-19). Cells exhibiting strong interferon/cytokine responses were found to be associated with the severity of COVID-19. Furthermore, differences in S100A8/A9-TLR4/RAGE interactions were confirmed between patients with severe and mild disease. CONCLUSIONS Lymphoid cells overexpressing S100A8/A9 contribute to the dysregulation of the innate immune response in patients with severe COVID-19, specifically during the early phase of infection. This study fosters a better understanding of the hyper-induction of pro-inflammatory cytokine expression and the generation of a cytokine storm in response to COVID-19 infection.
Collapse
Affiliation(s)
- Joongho Lee
- Department of Computer Science and Engineering, Graduate School, Dankook University, Yongin-si, Republic of Korea
| | - Hanbyeol Kim
- Department of Computer Science and Engineering, Graduate School, Dankook University, Yongin-si, Republic of Korea
| | - Minsoo Kim
- Department of Computer Science and Engineering, Graduate School, Dankook University, Yongin-si, Republic of Korea
| | - Seokhyun Yoon
- Department of Computer Science and Engineering, Graduate School, Dankook University, Yongin-si, Republic of Korea. .,Department of Electronics and Electrical Engineering, College of Engineering, Dankook University, Yongin-si, Republic of Korea.
| | - Sanghun Lee
- Department of Bioconvergence Engineering, Graduate School, Dankook University, Yongin-si, Republic of Korea.
| |
Collapse
|
39
|
Bhadani R, Chen Z, An L. Attention-Based Graph Neural Network for Label Propagation in Single-Cell Omics. Genes (Basel) 2023; 14:506. [PMID: 36833434 PMCID: PMC9957137 DOI: 10.3390/genes14020506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023] Open
Abstract
Single-cell data analysis has been at forefront of development in biology and medicine since sequencing data have been made available. An important challenge in single-cell data analysis is the identification of cell types. Several methods have been proposed for cell-type identification. However, these methods do not capture the higher-order topological relationship between different samples. In this work, we propose an attention-based graph neural network that captures the higher-order topological relationship between different samples and performs transductive learning for predicting cell types. The evaluation of our method on both simulation and publicly available datasets demonstrates the superiority of our method, scAGN, in terms of prediction accuracy. In addition, our method works best for highly sparse datasets in terms of F1 score, precision score, recall score, and Matthew's correlation coefficients as well. Further, our method's runtime complexity is consistently faster compared to other methods.
Collapse
Affiliation(s)
- Rahul Bhadani
- Department of Electrical & Computer Engineering, The University of Arizona, Tucson, AZ 85721, USA
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
| | - Zhuo Chen
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
| | - Lingling An
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ 85721, USA
- Department of Epidemiology and Biostatistics, The University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
40
|
Wang K, Li Z, You ZH, Han P, Nie R. Adversarial dense graph convolutional networks for single-cell classification. Bioinformatics 2023; 39:6994183. [PMID: 36661313 PMCID: PMC9919433 DOI: 10.1093/bioinformatics/btad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/30/2022] [Accepted: 01/19/2023] [Indexed: 01/21/2023] Open
Abstract
MOTIVATION In single-cell transcriptomics applications, effective identification of cell types in multicellular organisms and in-depth study of the relationships between genes has become one of the main goals of bioinformatics research. However, data heterogeneity and random noise pose significant difficulties for scRNA-seq data analysis. RESULTS We have proposed an adversarial dense graph convolutional network architecture for single-cell classification. Specifically, to enhance the representation of higher-order features and the organic combination between features, dense connectivity mechanism and attention-based feature aggregation are introduced for feature learning in convolutional neural networks. To preserve the features of the original data, we use a feature reconstruction module to assist the goal of single-cell classification. In addition, HNNVAT uses virtual adversarial training to improve the generalization and robustness. Experimental results show that our model outperforms the existing classical methods in terms of classification accuracy on benchmark datasets. AVAILABILITY AND IMPLEMENTATION The source code of HNNVAT is available at https://github.com/DisscLab/HNNVAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kangwei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhengwei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Pengyong Han
- Central Lab, Changzhi Medical College, Changzhi 046000, China
| | - Ru Nie
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
41
|
Camps J, Noël F, Liechti R, Massenet-Regad L, Rigade S, Götz L, Hoffmann C, Amblard E, Saichi M, Ibrahim MM, Pollard J, Medvedovic J, Roider HG, Soumelis V. Meta-Analysis of Human Cancer Single-Cell RNA-Seq Datasets Using the IMMUcan Database. Cancer Res 2023; 83:363-373. [PMID: 36459564 PMCID: PMC9896021 DOI: 10.1158/0008-5472.can-22-0074] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 08/15/2022] [Accepted: 11/29/2022] [Indexed: 12/03/2022]
Abstract
The development of single-cell RNA sequencing (scRNA-seq) technologies has greatly contributed to deciphering the tumor microenvironment (TME). An enormous amount of independent scRNA-seq studies have been published representing a valuable resource that provides opportunities for meta-analysis studies. However, the massive amount of biological information, the marked heterogeneity and variability between studies, and the technical challenges in processing heterogeneous datasets create major bottlenecks for the full exploitation of scRNA-seq data. We have developed IMMUcan scDB (https://immucanscdb.vital-it.ch), a fully integrated scRNA-seq database exclusively dedicated to human cancer and accessible to nonspecialists. IMMUcan scDB encompasses 144 datasets on 56 different cancer types, annotated in 50 fields containing precise clinical, technological, and biological information. A data processing pipeline was developed and organized in four steps: (i) data collection; (ii) data processing (quality control and sample integration); (iii) supervised cell annotation with a cell ontology classifier of the TME; and (iv) interface to analyze TME in a cancer type-specific or global manner. This framework was used to explore datasets across tumor locations in a gene-centric (CXCL13) and cell-centric (B cells) manner as well as to conduct meta-analysis studies such as ranking immune cell types and genes correlated to malignant transformation. This integrated, freely accessible, and user-friendly resource represents an unprecedented level of detailed annotation, offering vast possibilities for downstream exploitation of human cancer scRNA-seq data for discovery and validation studies. SIGNIFICANCE The IMMUcan scDB database is an accessible supportive tool to analyze and decipher tumor-associated single-cell RNA sequencing data, allowing researchers to maximally use this data to provide new insights into cancer biology.
Collapse
Affiliation(s)
- Jordi Camps
- Biomedical Data Science, Research & Early Development Oncology, Bayer AG, Berlin, Germany
| | - Floriane Noël
- Université de Paris, Institut de Recherche Saint-Louis, INSERM U976, Paris, France
| | - Robin Liechti
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Lucile Massenet-Regad
- Université de Paris, Institut de Recherche Saint-Louis, INSERM U976, Paris, France.,Université Paris-Saclay, Saint Aubin, France
| | - Sidwell Rigade
- Université de Paris, Institut de Recherche Saint-Louis, INSERM U976, Paris, France
| | - Lou Götz
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Caroline Hoffmann
- Institut Curie, INSERM U932 Research Unit, Department of Surgical Oncology, PSL University, Paris, France
| | - Elise Amblard
- Université de Paris, Institut de Recherche Saint-Louis, INSERM U976, Paris, France.,Université de Paris, Centre de Recherches Interdisciplinaires, Paris, France
| | - Melissa Saichi
- Université de Paris, Institut de Recherche Saint-Louis, INSERM U976, Paris, France
| | - Mahmoud M. Ibrahim
- Biomedical Data Science, Research & Early Development Premedical, Bayer AG, Wuppertal, Germany
| | - Jack Pollard
- Sanofi Research and Development, Cambridge, Massachusetts
| | - Jasna Medvedovic
- Université de Paris, Institut de Recherche Saint-Louis, INSERM U976, Paris, France
| | - Helge G. Roider
- Oncology Precision Medicine, Research & Early Development Oncology, Bayer AG, Berlin, Germany.,Corresponding Authors: Vassili Soumelis, Institut de Recherche St Louis (IRSL), Inserm U976, 26 rue d'Ulm, Paris 75005, France. Phone: 677-721-530; E-mail: ; and Helge G. Roider, Bayer AG, Müllerstraße 178, Berlin 13353, Germany. Phone: 152-068-42034; E-mail:
| | - Vassili Soumelis
- Université de Paris, Institut de Recherche Saint-Louis, INSERM U976, Paris, France.,Assistance Publique-Hôpitaux de Paris (AP-HP), Hôpital Saint-Louis, Laboratoire d'Immunologie, Paris, France.,Owkin, Paris, France.,Corresponding Authors: Vassili Soumelis, Institut de Recherche St Louis (IRSL), Inserm U976, 26 rue d'Ulm, Paris 75005, France. Phone: 677-721-530; E-mail: ; and Helge G. Roider, Bayer AG, Müllerstraße 178, Berlin 13353, Germany. Phone: 152-068-42034; E-mail:
| |
Collapse
|
42
|
Subedi S, Park YP. Single-cell pair-wise relationships untangled by composite embedding model. iScience 2023; 26:106025. [PMID: 36824286 PMCID: PMC9941206 DOI: 10.1016/j.isci.2023.106025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/24/2022] [Accepted: 01/17/2023] [Indexed: 01/25/2023] Open
Abstract
In multicellular organisms, cell identity and functions are primed and refined through interactions with other surrounding cells. Here, we propose a scalable machine learning method, termed SPRUCE, which is designed to systematically ascertain common cell-cell communication patterns embedded in single-cell RNA-seq data. We applied our approach to investigate tumor microenvironments consolidating multiple breast cancer datasets and found seven frequently observed interaction signatures and underlying gene-gene interaction networks. Our results implicate that a part of tumor heterogeneity, especially within the same subtype, is better understood by differential interaction patterns rather than the static expression of known marker genes.
Collapse
Affiliation(s)
- Sishir Subedi
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada,BC Cancer Research, Part of Provincial Health Care Authority, Vancouver, BC, Canada
| | - Yongjin P. Park
- BC Cancer Research, Part of Provincial Health Care Authority, Vancouver, BC, Canada,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada,Department of Statistics, University of British Columbia, Vancouver, BC, Canada,Corresponding author
| |
Collapse
|
43
|
Chen J, Xu H, Tao W, Chen Z, Zhao Y, Han JDJ. Transformer for one stop interpretable cell type annotation. Nat Commun 2023; 14:223. [PMID: 36641532 PMCID: PMC9840170 DOI: 10.1038/s41467-023-35923-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023] Open
Abstract
Consistent annotation transfer from reference dataset to query dataset is fundamental to the development and reproducibility of single-cell research. Compared with traditional annotation methods, deep learning based methods are faster and more automated. A series of useful single cell analysis tools based on autoencoder architecture have been developed but these struggle to strike a balance between depth and interpretability. Here, we present TOSICA, a multi-head self-attention deep learning model based on Transformer that enables interpretable cell type annotation using biologically understandable entities, such as pathways or regulons. We show that TOSICA achieves fast and accurate one-stop annotation and batch-insensitive integration while providing biologically interpretable insights for understanding cellular behavior during development and disease progressions. We demonstrate TOSICA's advantages by applying it to scRNA-seq data of tumor-infiltrating immune cells, and CD14+ monocytes in COVID-19 to reveal rare cell types, heterogeneity and dynamic trajectories associated with disease progression and severity.
Collapse
Affiliation(s)
- Jiawei Chen
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Hao Xu
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Wanyu Tao
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Zhaoxiong Chen
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Yuxuan Zhao
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| |
Collapse
|
44
|
Liu Y, Yan H, Shen LC, Yu DJ. Learning Cell Annotation under Multiple Reference Datasets by Multisource Domain Adaptation. J Chem Inf Model 2023; 63:397-405. [PMID: 36579851 DOI: 10.1021/acs.jcim.2c01277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Accurate and efficient cell type annotation is essential for single-cell sequence analysis. Currently, cell type annotation using well-annotated reference datasets with powerful models has become increasingly popular. However, with the increasing amount of single-cell data, there is an urgent need to develop a novel annotation method that can integrate multiple reference datasets to improve cell type annotation performance. Since the unwanted batch effects between individual reference datasets, integrating multiple reference datasets is still an open challenge. To address this, we proposed scMDR and scMultiR, respectively, using multisource domain adaptation to learn cell type-specific information from multiple reference datasets and query cells. Based on the learned cell type-specific information, scMDR and scMultiR provide the most likely cell types for the query cells. Benchmark experiments demonstrated their state-of-the-art effectiveness for integrative single-cell assignment with multiple reference datasets.
Collapse
Affiliation(s)
- Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China
| |
Collapse
|
45
|
Ren T, Huang S, Liu Q, Wang G. scWECTA: A weighted ensemble classification framework for cell type assignment based on single cell transcriptome. Comput Biol Med 2023; 152:106409. [PMID: 36512878 DOI: 10.1016/j.compbiomed.2022.106409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/16/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022]
Abstract
Rapid advances in single-cell transcriptome analysis provide deeper insights into the study of tissue heterogeneity at the cellular level. Unsupervised clustering can identify potential cell populations in single-cell RNA-sequencing (scRNA-seq) data, but fail to further determine the identity of each cell. Existing automatic annotation methods using scRNA-seq data based on machine learning mainly use single feature set and single classifier. In view of this, we propose a Weighted Ensemble classification framework for Cell Type Annotation, named scWECTA, which improves the accuracy of cell type identification. scWECTA uses five informative gene sets and integrates five classifiers based on soft weighted ensemble framework. And the ensemble weights are inferred through the constrained non-negative least squares. Validated on multiple pairs of scRNA-seq datasets, scWECTA is able to accurately annotate scRNA-seq data across platforms and across tissues, especially for imbalanced data containing rare cell types. Moreover, scWECTA outperforms other comparable methods in balancing the prediction accuracy of common cell types and the unassigned rate of non-common cell types at the same time. The source code of scWECTA is freely available at https://github.com/ttren-sc/scWECTA.
Collapse
Affiliation(s)
- Tongtong Ren
- School of Computer Science and Technology, Harbin Institute of Technology, No.92 West Dazhi Street, Nangang District, Harbin, Heilongjiang, 150001, PR China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, No. 246, Xuefu Street, Nangang District, Harbin, Heilongjiang, 150081, PR China
| | - Qiaoming Liu
- School of Computer Science and Technology, Harbin Institute of Technology, No.92 West Dazhi Street, Nangang District, Harbin, Heilongjiang, 150001, PR China
| | - Guohua Wang
- School of Computer Science and Technology, Harbin Institute of Technology, No.92 West Dazhi Street, Nangang District, Harbin, Heilongjiang, 150001, PR China.
| |
Collapse
|
46
|
Christensen E, Luo P, Turinsky A, Husić M, Mahalanabis A, Naidas A, Diaz-Mejia JJ, Brudno M, Pugh T, Ramani A, Shooshtari P. Evaluation of single-cell RNAseq labelling algorithms using cancer datasets. Brief Bioinform 2022; 24:6965910. [PMID: 36585784 PMCID: PMC9851326 DOI: 10.1093/bib/bbac561] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 09/19/2022] [Accepted: 11/01/2022] [Indexed: 01/01/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) clustering and labelling methods are used to determine precise cellular composition of tissue samples. Automated labelling methods rely on either unsupervised, cluster-based approaches or supervised, cell-based approaches to identify cell types. The high complexity of cancer poses a unique challenge, as tumor microenvironments are often composed of diverse cell subpopulations with unique functional effects that may lead to disease progression, metastasis and treatment resistance. Here, we assess 17 cell-based and 9 cluster-based scRNA-seq labelling algorithms using 8 cancer datasets, providing a comprehensive large-scale assessment of such methods in a cancer-specific context. Using several performance metrics, we show that cell-based methods generally achieved higher performance and were faster compared to cluster-based methods. Cluster-based methods more successfully labelled non-malignant cell types, likely because of a lack of gene signatures for relevant malignant cell subpopulations. Larger cell numbers present in some cell types in training data positively impacted prediction scores for cell-based methods. Finally, we examined which methods performed favorably when trained and tested on separate patient cohorts in scenarios similar to clinical applications, and which were able to accurately label particularly small or under-represented cell populations in the given datasets. We conclude that scPred and SVM show the best overall performances with cancer-specific data and provide further suggestions for algorithm selection. Our analysis pipeline for assessing the performance of cell type labelling algorithms is available in https://github.com/shooshtarilab/scRNAseq-Automated-Cell-Type-Labelling.
Collapse
Affiliation(s)
| | | | - Andrei Turinsky
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Mia Husić
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Alaina Mahalanabis
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Alaine Naidas
- Children’s Health Research Institute, Lawson Research Institute, London, ON, Canada
- Department of Pathology and Lab Medicine, University of Western Ontario, London, ON, Canada
| | | | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Trevor Pugh
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Arun Ramani
- Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Parisa Shooshtari
- Corresponding author: Parisa Shooshtari, Department of Pathology and Lab Medicine, University of Western Ontario, London, ON, Canada. Tel.: +1 (519) 685-8500 x55427. E-mail:
| |
Collapse
|
47
|
Deep transfer learning enables lesion tracing of circulating tumor cells. Nat Commun 2022; 13:7687. [PMID: 36509761 PMCID: PMC9744915 DOI: 10.1038/s41467-022-35296-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 11/28/2022] [Indexed: 12/14/2022] Open
Abstract
Liquid biopsy offers great promise for noninvasive cancer diagnostics, while the lack of adequate target characterization and analysis hinders its wide application. Single-cell RNA sequencing (scRNA-seq) is a powerful technology for cell characterization. Integrating scRNA-seq into a CTC-focused liquid biopsy study can perhaps classify CTCs by their original lesions. However, the lack of CTC scRNA-seq data accumulation and prior knowledge hinders further development. Therefore, we design CTC-Tracer, a transfer learning-based algorithm, to correct the distributional shift between primary cancer cells and CTCs to transfer lesion labels from the primary cancer cell atlas to CTCs. The robustness and accuracy of CTC-Tracer are validated by 8 individual standard datasets. We apply CTC-Tracer on a complex dataset consisting of RNA-seq profiles of single CTCs, CTC clusters from a BRCA patient, and two xenografts, and demonstrate that CTC-Tracer has potential in knowledge transfer between different types of RNA-seq data of lesions and CTCs.
Collapse
|
48
|
Su M, Pan T, Chen QZ, Zhou WW, Gong Y, Xu G, Yan HY, Li S, Shi QZ, Zhang Y, He X, Jiang CJ, Fan SC, Li X, Cairns MJ, Wang X, Li YS. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Mil Med Res 2022; 9:68. [PMID: 36461064 PMCID: PMC9716519 DOI: 10.1186/s40779-022-00434-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
The application of single-cell RNA sequencing (scRNA-seq) in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies. With the expansion of capacity for high-throughput scRNA-seq, including clinical samples, the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field. Here, we review the workflow for typical scRNA-seq data analysis, covering raw data processing and quality control, basic data analysis applicable for almost all scRNA-seq data sets, and advanced data analysis that should be tailored to specific scientific questions. While summarizing the current methods for each analysis step, we also provide an online repository of software and wrapped-up scripts to support the implementation. Recommendations and caveats are pointed out for some specific analysis tasks and approaches. We hope this resource will be helpful to researchers engaging with scRNA-seq, in particular for emerging clinical applications.
Collapse
Affiliation(s)
- Min Su
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
| | - Tao Pan
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
| | - Qiu-Zhen Chen
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
| | - Wei-Wei Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Yi Gong
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China.,Department of Immunology, Nanjing Medical University, Nanjing, 211166, China
| | - Gang Xu
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
| | - Huan-Yu Yan
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
| | - Si Li
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
| | - Qiao-Zhen Shi
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
| | - Ya Zhang
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
| | - Xiao He
- Department of Laboratory Medicine, Women and Children's Hospital of Chongqing Medical University, Chongqing, 401174, China
| | | | - Shi-Cai Fan
- Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110, Guangdong, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, the University of Newcastle, University Drive, Callaghan, NSW, 2308, Australia. .,Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW, 2305, Australia.
| | - Xi Wang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China.
| | - Yong-Sheng Li
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China.
| |
Collapse
|
49
|
Liu Z, Li H, Dang Q, Weng S, Duo M, Lv J, Han X. Integrative insights and clinical applications of single-cell sequencing in cancer immunotherapy. Cell Mol Life Sci 2022; 79:577. [DOI: 10.1007/s00018-022-04608-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 10/12/2022] [Accepted: 10/20/2022] [Indexed: 11/03/2022]
|
50
|
Jiang T, Zhou W, Sheng Q, Yu J, Xie Y, Ding N, Zhang Y, Xu J, Li Y. ImmCluster: an ensemble resource for immunology cell type clustering and annotations in normal and cancerous tissues. Nucleic Acids Res 2022; 51:D1325-D1332. [PMID: 36271790 PMCID: PMC9825417 DOI: 10.1093/nar/gkac922] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/22/2022] [Accepted: 10/06/2022] [Indexed: 01/30/2023] Open
Abstract
Single-cell transcriptome has enabled the transcriptional profiling of thousands of immune cells in complex tissues and cancers. However, subtle transcriptomic differences in immune cell subpopulations and the high dimensionality of transcriptomic data make the clustering and annotation of immune cells challenging. Herein, we introduce ImmCluster (http://bio-bigdata.hrbmu.edu.cn/ImmCluster) for immunology cell type clustering and annotation. We manually curated 346 well-known marker genes from 1163 studies. ImmCluster integrates over 420 000 immune cells from nine healthy tissues and over 648 000 cells from different tumour samples of 17 cancer types to generate stable marker-gene sets and develop context-specific immunology references. In addition, ImmCluster provides cell clustering using seven reference-based and four marker gene-based computational methods, and the ensemble method was developed to provide consistent cell clustering than individual methods. Five major analytic modules were provided for interactively exploring the annotations of immune cells, including clustering and annotating immune cell clusters, gene expression of markers, functional assignment in cancer hallmarks, cell states and immune pathways, cell-cell communications and the corresponding ligand-receptor interactions, as well as online tools. ImmCluster generates diverse plots and tables, enabling users to identify significant associations in immune cell clusters simultaneously. ImmCluster is a valuable resource for analysing cellular heterogeneity in cancer microenvironments.
Collapse
Affiliation(s)
| | | | | | | | - Yunjin Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang150081, China
| | - Na Ding
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang150081, China
| | - Yunpeng Zhang
- Correspondence may also be addressed to Yunpeng Zhang.
| | - Juan Xu
- Correspondence may also be addressed to Juan Xu.
| | - Yongsheng Li
- To whom correspondence should be addressed. Tel: +86 13604805482;
| |
Collapse
|