1
|
Liu Y, Pei W, Chen L, Xia Y, Yan H, Hu X. scCorrect: Cross-modality label transfer from scRNA-seq to scATAC-seq using domain adaptation. Anal Biochem 2025; 702:115847. [PMID: 40154828 DOI: 10.1016/j.ab.2025.115847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 03/10/2025] [Accepted: 03/15/2025] [Indexed: 04/01/2025]
Abstract
Cell type annotation in single-cell chromatin accessibility sequencing (scATAC-seq) is crucial for enabling researchers to identify subpopulations of cells associated with specific diseases, elucidate gene regulatory networks, and discover markers indicative of disease states. The prevailing approach for cell type annotation in single-cell research involves transferring well-delineated cell types from single-cell RNA sequencing (scRNA-seq) data to scATAC-seq data using a label propagation algorithm. However, the inherent modal discrepancies (i.e.biological interpretation) between scRNA-seq and scATAC-seq data, coupled with the intrinsic sparsity and high dimensionality of scATAC-seq data, pose significant challenges to the efficacy of this strategy. To address these challenges, we introduce a novel neural network framework, scCorrect, which operates in two distinct phases. In the first phase, scCorrect aligns the scRNA-seq and scATAC-seq datasets, generating initial annotation results. The second phase involves training a corrective network specifically designed to amend any erroneous annotations produced during the first phase. Empirical tests across multiple datasets have demonstrated that scCorrect consistently achieves superior recognition accuracy, underscoring its significant potential to enhance disease-related research in humans.
Collapse
Affiliation(s)
- Yan Liu
- Department of Computer Science, Yangzhou University, Yangzhou, 225100, PR China.
| | - Wenyi Pei
- Geriatric Department, Shanghai Baoshan District Wusong Central Hospital, Tongtai North Road 101, Shanghai, 200940, PR China
| | - Li Chen
- Department of Computer Science, Yangzhou University, Yangzhou, 225100, PR China
| | - Yu Xia
- Department of Computer Science, Yangzhou University, Yangzhou, 225100, PR China
| | - He Yan
- College of Information Science and Technology, Nanjing Forestry University, Nanjing, 210037, PR China
| | - Xiaohua Hu
- Geriatric Department, Shanghai Baoshan District Wusong Central Hospital, Tongtai North Road 101, Shanghai, 200940, PR China; Digital Innovation Laboratory, The First Affiliated Hospital of Naval Medical University, Changhai Road 168, Shanghai, 200433, PR China.
| |
Collapse
|
2
|
da Silva JEH, Bernardino HS, de Oliveira IL, Camata JJ. A survey of the methodological process of modeling, inference, and evaluation of gene regulatory networks using scRNA-Seq data. Biosystems 2025; 253:105464. [PMID: 40409400 DOI: 10.1016/j.biosystems.2025.105464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 03/20/2025] [Accepted: 04/17/2025] [Indexed: 05/25/2025]
Abstract
The advent of scRNA-Seq sequencing technology has provided unprecedented resolutions in the analysis of gene regulatory networks (GRNs) at the single-cell level. However, new technical and methodological challenges also emerged. Factors such as the large number of zeros reported in expression levels, the biological variation due to the stochastic nature of gene expression, environmental niche, and effects created by the cell cycle make it difficult to correctly interpret the data obtained in the sequencing stage. On the other hand, the development of methods for the inference of GRNs, specifically using scRNA-Seq technology, proved to be of similar quality to random predictors. The lack of adequate pre-processing of gene expression data, including selection steps for subsets of genes of interest, smoothing, and discretization of gene expression, in addition to the different ways of modeling networks and network motifs, are factors that affect the performance of inference approaches. Finally, the lack of knowledge about the ground-truth network and the non-standardization of appropriate metrics to measure the quality of inferred networks make the process of comparing performance between algorithms a major problem, given the unbalanced nature of the data and the interpretation bias caused by the chosen metric. This article brings these issues to light, aiming to show how these factors influence both the inference process and the performance evaluation of inferred networks, through comparative computational experiments and provides suggestions for a more robust methodological process for researchers dealing with inference of GRNs.
Collapse
Affiliation(s)
- José Eduardo H da Silva
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil.
| | - Heder S Bernardino
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - Itamar L de Oliveira
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - José J Camata
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| |
Collapse
|
3
|
M A Basher AR, Hallinan C, Lee K. Heterogeneity-preserving discriminative feature selection for disease-specific subtype discovery. Nat Commun 2025; 16:3593. [PMID: 40234411 PMCID: PMC12000357 DOI: 10.1038/s41467-025-58718-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 03/26/2025] [Indexed: 04/17/2025] Open
Abstract
Disease-specific subtype identification can deepen our understanding of disease progression and pave the way for personalized therapies, given the complexity of disease heterogeneity. Large-scale transcriptomic, proteomic, and imaging datasets create opportunities for discovering subtypes but also pose challenges due to their high dimensionality. To mitigate this, many feature selection methods focus on selecting features that distinguish known diseases or cell states, yet often miss features that preserve heterogeneity and reveal new subtypes. To overcome this gap, we develop Preserving Heterogeneity (PHet), a statistical methodology that employs iterative subsampling and differential analysis of interquartile range, in conjunction with Fisher's method, to identify a small set of features that enhance subtype clustering quality. Here, we show that this method can maintain sample heterogeneity while distinguishing known disease/cell states, with a tendency to outperform previous differential expression and outlier-based methods, indicating its potential to advance our understanding of disease mechanisms and cell differentiation.
Collapse
Affiliation(s)
- Abdur Rahman M A Basher
- Vascular Biology Program, Boston Children's Hospital, Boston, MA, USA
- Department of Surgery, Harvard Medical School, Boston, MA, USA
| | - Caleb Hallinan
- Vascular Biology Program, Boston Children's Hospital, Boston, MA, USA
| | - Kwonmoo Lee
- Vascular Biology Program, Boston Children's Hospital, Boston, MA, USA.
- Department of Surgery, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Nitz A, Giraldez Chavez JH, Eliason ZG, Payne SH. Are We There Yet? Assessing the Readiness of Single-Cell Proteomics to Answer Biological Hypotheses. J Proteome Res 2025; 24:1482-1492. [PMID: 38981598 PMCID: PMC11976870 DOI: 10.1021/acs.jproteome.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/02/2024] [Accepted: 06/13/2024] [Indexed: 07/11/2024]
Abstract
Single-cell analysis is an active area of research in many fields of biology. Measurements at single-cell resolution allow researchers to study diverse populations without losing biologically meaningful information to sample averages. Many technologies have been used to study single cells, including mass spectrometry-based single-cell proteomics (SCP). SCP has seen a lot of growth over the past couple of years through improvements in data acquisition and analysis, leading to greater proteomic depth. Because method development has been the main focus in SCP, biological applications have been sprinkled in only as proof-of-concept. However, SCP methods now provide significant coverage of the proteome and have been implemented in many laboratories. Thus, a primary question to address in our community is whether the current state of technology is ready for widespread adoption for biological inquiry. In this Perspective, we examine the potential for SCP in three thematic areas of biological investigation: cell annotation, developmental trajectories, and spatial mapping. We identify that the primary limitation of SCP is sample throughput. As proteome depth has been the primary target for method development to date, we advocate for a change in focus to facilitate measuring tens of thousands of single-cell proteomes to enable biological applications beyond proof-of-concept.
Collapse
Affiliation(s)
- Alyssa
A. Nitz
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| | | | - Zachary G. Eliason
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| | - Samuel H. Payne
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| |
Collapse
|
5
|
Traversa D, Chiara M. Mapping Cell Identity from scRNA-seq: A primer on computational methods. Comput Struct Biotechnol J 2025; 27:1559-1569. [PMID: 40270709 PMCID: PMC12017876 DOI: 10.1016/j.csbj.2025.03.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 03/29/2025] [Accepted: 03/31/2025] [Indexed: 04/25/2025] Open
Abstract
Single cell (sc) technologies mark a conceptual and methodological breakthrough in our way to study cells, the base units of life. Thanks to these technological developments, large-scale initiatives are currently ongoing aimed at mapping of all the cell types in the human body, with the ambitious aim to gain a cell-level resolution of physiological development and disease. Since its broad applicability and ease of interpretation scRNA-seq is probably the most common sc-based application. This assay uses high throughput RNA sequencing to capture gene expression profiles at the sc-level. Subsequently, under the assumption that differences in transcriptional programs correspond to distinct cellular identities, ad-hoc computational methods are used to infer cell types from gene expression patterns. A wide array of computational methods were developed for this task. However, depending on the underlying algorithmic approach and associated computational requirements, each method might have a specific range of application, with implications that are not always clear to the end user. Here we will provide a concise overview on state-of-the-art computational methods for cell identity annotation in scRNA-seq, tailored for new users and non-computational scientists. To this end, we classify existing tools in five main categories, and discuss their key strengths, limitations and range of application.
Collapse
Affiliation(s)
- Daniele Traversa
- Department of Biosciences, Università degli Studi di Milano, via Celoria 26, Milan 20133, Italy
| | - Matteo Chiara
- Department of Biosciences, Università degli Studi di Milano, via Celoria 26, Milan 20133, Italy
| |
Collapse
|
6
|
Sujana STA, Shahjaman M, Singha AC. Application of bioinformatic tools in cell type classification for single-cell RNA-seq data. Comput Biol Chem 2025; 115:108332. [PMID: 39793515 DOI: 10.1016/j.compbiolchem.2024.108332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/06/2024] [Accepted: 12/24/2024] [Indexed: 01/13/2025]
Abstract
The advancements in single-cell RNA sequencing (scRNAseq) technology have significantly transformed genomics research, enabling the handling of thousands of cells in each experiment. As of now, 32,068 research studies have been cataloged in the Pubmed database. The primary aim of scRNAseq investigations is to identify cell types, understand the antitumor immune response, and identify new and uncommon cell types. Traditional techniques for identifying cell types include microscopy, histology, and pathological characteristics. However, the complexity of instruments and the need for precise experimental design make it difficult to fully capture the overall heterogeneity. Unsupervised clustering and supervised classification methods have been used to solve this task. Supervised cell type classification methods have gained popularity as large-scale, high-quality, well-annotated and more robust results compared to clustering methods. A recent study showed that support vector machine (SVM) gives a high-quality classification performance in different scenarios. In this article, we compare and evaluate the performance of four different kernels (sigmoid, linear, radial, polynomial) of SVM. The results of the experiments on three standard scRNA-seq datasets indicate that SVM with linear and SVM with sigmoid kernel classify the cells more accurately (approx. 99 %) where SVM linear kernel method has remarkably fast computation time and we also evaluate the results using some single cell specific evaluation matrices F-1 score, MCC, AUC value. Additionally, it sheds light on the potential use of kernels of SVM to give underlying information of single-cell RNA-Seq data more effectively.
Collapse
Affiliation(s)
- Shah Tania Akter Sujana
- Bioinformatics Lab, Department of Statistics, Begum Rokeya University, Rangpur 5404, Bangladesh.
| | - Md Shahjaman
- Bioinformatics Lab, Department of Statistics, Begum Rokeya University, Rangpur 5404, Bangladesh.
| | - Atul Chandra Singha
- Bioinformatics Lab, Department of Statistics, Begum Rokeya University, Rangpur 5404, Bangladesh.
| |
Collapse
|
7
|
Liu T, Lin Y, Luo X, Sun Y, Zhao H. VISTA Uncovers Missing Gene Expression and Spatial-induced Information for Spatial Transcriptomic Data Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.08.26.609718. [PMID: 40166134 PMCID: PMC11957009 DOI: 10.1101/2024.08.26.609718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Characterizing cell activities within a spatially resolved context is essential to enhance our understanding of spatially-induced cellular states and features. While single-cell RNA-seq (scRNA-seq) offers comprehensive profiling of cells within a tissue, it fails to capture spatial context. Conversely, subcellular spatial transcriptomics (SST) technologies provide high-resolution spatial profiles of gene expression, yet their utility is constrained by the limited number of genes they can simultaneously profile. To address this limitation, we introduce VISTA, a novel approach designed to predict the expression levels of unobserved genes specifically tailored for SST data. VISTA jointly models scRNA-seq data and SST data based on variational inference and geometric deep learning, and incorporates uncertainty quantification. Using four SST datasets, we demonstrate VISTA's superior performance in imputation and in analyzing large-scale SST datasets with satisfactory time efficiency and memory consumption. The imputation of VISTA enables a multitude of downstream applications, including the detection of new spatially variable genes, the discovery of novel ligand-receptor interactions, the inference of spatial RNA velocity, the generation for spatial transcriptomics with in-silico perturbation, and an improved decomposition of spatial and intrinsic variations.
Collapse
Affiliation(s)
- Tianyu Liu
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, 06511, CT, USA
| | - Yingxin Lin
- Department of Biostatistics, Yale University, New Haven, 06511, CT, USA
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, Los Angeles, 90095, CA, USA
| | - Yizhou Sun
- Department of Computer Science, University of California, Los Angeles, Los Angeles, 90095, CA, USA
| | - Hongyu Zhao
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, 06511, CT, USA
- Department of Biostatistics, Yale University, New Haven, 06511, CT, USA
| |
Collapse
|
8
|
Caron DP, Specht WL, Chen D, Wells SB, Szabo PA, Jensen IJ, Farber DL, Sims PA. Multimodal hierarchical classification of CITE-seq data delineates immune cell states across lineages and tissues. CELL REPORTS METHODS 2025; 5:100938. [PMID: 39814026 PMCID: PMC11840950 DOI: 10.1016/j.crmeth.2024.100938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 08/21/2024] [Accepted: 12/09/2024] [Indexed: 01/18/2025]
Abstract
Single-cell RNA sequencing (scRNA-seq) is invaluable for profiling cellular heterogeneity and transcriptional states, but transcriptomic profiles do not always delineate subsets defined by surface proteins. Cellular indexing of transcriptomes and epitopes (CITE-seq) enables simultaneous profiling of single-cell transcriptomes and surface proteomes; however, accurate cell-type annotation requires a classifier that integrates multimodal data. Here, we describe multimodal classifier hierarchy (MMoCHi), a marker-based approach for accurate cell-type classification across multiple single-cell modalities that does not rely on reference atlases. We benchmark MMoCHi using sorted T lymphocyte subsets and annotate a cross-tissue human immune cell dataset. MMoCHi outperforms leading transcriptome-based classifiers and multimodal unsupervised clustering in its ability to identify immune cell subsets that are not readily resolved and to reveal subset markers. MMoCHi is designed for adaptability and can integrate annotation of cell types and developmental states across diverse lineages, samples, or modalities.
Collapse
Affiliation(s)
- Daniel P Caron
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - William L Specht
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - David Chen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Steven B Wells
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Peter A Szabo
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Isaac J Jensen
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Donna L Farber
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Surgery, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Peter A Sims
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY 10032, USA.
| |
Collapse
|
9
|
Zhao Y, Shang J, Qin B, Zhang L, He X, Ge D, Ren Q, Liu JX. pscAdapt: Pre-Trained Domain Adaptation Network Based on Structural Similarity for Cell Type Annotation in Single Cell RNA-seq Data. IEEE J Biomed Health Inform 2025; 29:724-732. [PMID: 39325614 DOI: 10.1109/jbhi.2024.3468310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2024]
Abstract
Cell type annotation refers to the process of categorizing and labeling cells to identify their specific cell types, which is crucial for understanding cell functions and biological processes. Although many methods have been developed for automated cell type annotation, they often encounter challenges such as batch effects due to variations in data distribution across platforms and species, thereby compromising their performance. To address batch effects, in this study, a pre-trained domain adaptation model based on structural similarity, named pscAdapt, is proposed for cell type annotation. Specifically, a pre-trained strategy is employed to initialize model parameters to learn the data distribution of source domain. This strategy is also combined with an adversarial learning strategy to train the domain adaptation network for achieving domain level alignment and reducing domain discrepancy. Furthermore, to better distinguish different types of cells, a structural similarity loss is designed, aiming to shorten distances between cells of the same type and increase distances between cells of different types in feature space, thus achieving cell level alignment and enhancing the discriminability of cell types. Comprehensive experiments were conducted on simulated datasets, cross-platforms datasets and cross-species datasets to validate the effectiveness of pscAdapt, results of which demonstrate that pscAdapt outperforms several popular cell type annotation methods.
Collapse
|
10
|
Gulati GS, D'Silva JP, Liu Y, Wang L, Newman AM. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics. Nat Rev Mol Cell Biol 2025; 26:11-31. [PMID: 39169166 DOI: 10.1038/s41580-024-00768-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/16/2024] [Indexed: 08/23/2024]
Abstract
Single-cell transcriptomics has broadened our understanding of cellular diversity and gene expression dynamics in healthy and diseased tissues. Recently, spatial transcriptomics has emerged as a tool to contextualize single cells in multicellular neighbourhoods and to identify spatially recurrent phenotypes, or ecotypes. These technologies have generated vast datasets with targeted-transcriptome and whole-transcriptome profiles of hundreds to millions of cells. Such data have provided new insights into developmental hierarchies, cellular plasticity and diverse tissue microenvironments, and spurred a burst of innovation in computational methods for single-cell analysis. In this Review, we discuss recent advancements, ongoing challenges and prospects in identifying and characterizing cell states and multicellular neighbourhoods. We discuss recent progress in sample processing, data integration, identification of subtle cell states, trajectory modelling, deconvolution and spatial analysis. Furthermore, we discuss the increasing application of deep learning, including foundation models, in analysing single-cell and spatial transcriptomics data. Finally, we discuss recent applications of these tools in the fields of stem cell biology, immunology, and tumour biology, and the future of single-cell and spatial transcriptomics in biological research and its translation to the clinic.
Collapse
Affiliation(s)
- Gunsagar S Gulati
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Yunhe Liu
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Linghua Wang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA
| | - Aaron M Newman
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub - San Francisco, San Francisco, CA, USA.
| |
Collapse
|
11
|
Liu T, Li K, Wang Y, Li H, Zhao H. Evaluating the Utilities of Foundation Models in Single-cell Data Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.08.555192. [PMID: 38464157 PMCID: PMC10925156 DOI: 10.1101/2023.09.08.555192] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Foundation Models (FMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of FMs for single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. Overall, the top FMs include scGPT, Geneformer, and CellPLM by considering model performances and user accessibility among ten single-cell FMs. However, by comparing these FMs with task-specific methods, we found that single-cell FMs may not consistently excel than task-specific methods in all tasks, which challenges the necessity of developing foundation models for single-cell analysis. In addition, we evaluated the effects of hyper-parameters, initial settings, and stability for training single-cell FMs based on a proposed scEval framework, and provide guidelines for pre-training and fine-tuning, to enhance the performances of single-cell FMs. Our work summarizes the current state of single-cell FMs, points to their constraints and avenues for future development, and offers a freely available evaluation pipeline to benchmark new models and improve method development.
Collapse
|
12
|
Chang CJ, Hsu CY, Liu Q, Shyr Y. VICTOR: Validation and inspection of cell type annotation through optimal regression. Comput Struct Biotechnol J 2024; 23:3270-3280. [PMID: 39296808 PMCID: PMC11408377 DOI: 10.1016/j.csbj.2024.08.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 08/30/2024] [Accepted: 08/31/2024] [Indexed: 09/21/2024] Open
Abstract
Single-cell RNA sequencing provides unprecedent opportunities to explore the heterogeneity and dynamics inherent in cellular biology. An essential step in the data analysis involves the automatic annotation of cells. Despite development of numerous tools for automated cell annotation, assessing the reliability of predicted annotations remains challenging, particularly for rare and unknown cell types. Here, we introduce VICTOR: Validation and inspection of cell type annotation through optimal regression. VICTOR aims to gauge the confidence of cell annotations by an elastic-net regularized regression with optimal thresholds. We demonstrated that VICTOR performed well in identifying inaccurate annotations, surpassing existing methods in diagnostic ability across various single-cell datasets, including within-platform, cross-platform, cross-studies, and cross-omics settings.
Collapse
Affiliation(s)
- Chia-Jung Chang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Department of Biomedical Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Chih-Yuan Hsu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| |
Collapse
|
13
|
Bonev B, Castelo-Branco G, Chen F, Codeluppi S, Corces MR, Fan J, Heiman M, Harris K, Inoue F, Kellis M, Levine A, Lotfollahi M, Luo C, Maynard KR, Nitzan M, Ramani V, Satijia R, Schirmer L, Shen Y, Sun N, Green GS, Theis F, Wang X, Welch JD, Gokce O, Konopka G, Liddelow S, Macosko E, Ali Bayraktar O, Habib N, Nowakowski TJ. Opportunities and challenges of single-cell and spatially resolved genomics methods for neuroscience discovery. Nat Neurosci 2024; 27:2292-2309. [PMID: 39627587 PMCID: PMC11999325 DOI: 10.1038/s41593-024-01806-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 09/23/2024] [Indexed: 12/13/2024]
Abstract
Over the past decade, single-cell genomics technologies have allowed scalable profiling of cell-type-specific features, which has substantially increased our ability to study cellular diversity and transcriptional programs in heterogeneous tissues. Yet our understanding of mechanisms of gene regulation or the rules that govern interactions between cell types is still limited. The advent of new computational pipelines and technologies, such as single-cell epigenomics and spatially resolved transcriptomics, has created opportunities to explore two new axes of biological variation: cell-intrinsic regulation of cell states and expression programs and interactions between cells. Here, we summarize the most promising and robust technologies in these areas, discuss their strengths and limitations and discuss key computational approaches for analysis of these complex datasets. We highlight how data sharing and integration, documentation, visualization and benchmarking of results contribute to transparency, reproducibility, collaboration and democratization in neuroscience, and discuss needs and opportunities for future technology development and analysis.
Collapse
Affiliation(s)
- Boyan Bonev
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany
- Physiological Genomics, Biomedical Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gonçalo Castelo-Branco
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Fei Chen
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - M Ryan Corces
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Jean Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Myriam Heiman
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- The Picower Institute for Learning and Memory, MIT, Cambridge, MA, USA
| | - Kenneth Harris
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Manolis Kellis
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ariel Levine
- Spinal Circuits and Plasticity Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
| | - Mo Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Chongyuan Luo
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Vijay Ramani
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, San Francisco, CA, USA
| | - Rahul Satijia
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Lucas Schirmer
- Department of Neurology, Mannheim Center for Translational Neuroscience, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Yin Shen
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Na Sun
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gilad S Green
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Fabian Theis
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Xiao Wang
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Ozgun Gokce
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
- Department of Neurodegenerative Diseases and Geriatric Psychiatry, University Hospital Bonn, Bonn, Germany.
| | - Genevieve Konopka
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX, USA.
- Peter O'Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA.
| | - Shane Liddelow
- Neuroscience Institute, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Neuroscience & Physiology, NYU Grossman School of Medicine, New York, NY, USA.
- Parekh Center for Interdisciplinary Neurology, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Ophthalmology, NYU Grossman School of Medicine, New York, NY, USA.
| | - Evan Macosko
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA.
| | | | - Naomi Habib
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Tomasz J Nowakowski
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, USA.
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
14
|
Lan W, Ling T, Chen Q, Zheng R, Li M, Pan Y. scMoMtF: An interpretable multitask learning framework for single-cell multi-omics data analysis. PLoS Comput Biol 2024; 20:e1012679. [PMID: 39693287 DOI: 10.1371/journal.pcbi.1012679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 11/26/2024] [Indexed: 12/20/2024] Open
Abstract
With the rapidly development of biotechnology, it is now possible to obtain single-cell multi-omics data in the same cell. However, how to integrate and analyze these single-cell multi-omics data remains a great challenge. Herein, we introduce an interpretable multitask framework (scMoMtF) for comprehensively analyzing single-cell multi-omics data. The scMoMtF can simultaneously solve multiple key tasks of single-cell multi-omics data including dimension reduction, cell classification and data simulation. The experimental results shows that scMoMtF outperforms current state-of-the-art algorithms on these tasks. In addition, scMoMtF has interpretability which allowing researchers to gain a reliable understanding of potential biological features and mechanisms in single-cell multi-omics data.
Collapse
Affiliation(s)
- Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of computer, electronic and information, Guangxi university, Nanning, Guangxi, China
| | - Tongsheng Ling
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of computer, electronic and information, Guangxi university, Nanning, Guangxi, China
| | - Qingfeng Chen
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of computer, electronic and information, Guangxi university, Nanning, Guangxi, China
| | - Ruiqing Zheng
- School of computer and engineering, Central South University, Changsha, Hunan, China
| | - Min Li
- School of computer and engineering, Central South University, Changsha, Hunan, China
| | - Yi Pan
- School of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, Guangdong, China
| |
Collapse
|
15
|
Liu T, Long W, Cao Z, Wang Y, He CH, Zhang L, Strittmatter SM, Zhao H. CosGeneGate selects multi-functional and credible biomarkers for single-cell analysis. Brief Bioinform 2024; 26:bbae626. [PMID: 39592241 PMCID: PMC11596696 DOI: 10.1093/bib/bbae626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 10/07/2024] [Accepted: 11/14/2024] [Indexed: 11/28/2024] Open
Abstract
MOTIVATION Selecting representative genes or marker genes to distinguish cell types is an important task in single-cell sequencing analysis. Although many methods have been proposed to select marker genes, the genes selected may have redundancy and/or do not show cell-type-specific expression patterns to distinguish cell types. RESULTS Here, we present a novel model, named CosGeneGate, to select marker genes for more effective marker selections. CosGeneGate is inspired by combining the advantages of selecting marker genes based on both cell-type classification accuracy and marker gene specific expression patterns. We demonstrate the better performance of the marker genes selected by CosGeneGate for various downstream analyses than the existing methods with both public datasets and newly sequenced datasets. The non-redundant marker genes identified by CosGeneGate for major cell types and tissues in human can be found at the website as follows: https://github.com/VivLon/CosGeneGate/blob/main/marker gene list.xlsx.
Collapse
Affiliation(s)
- Tianyu Liu
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, United States
| | - Wenxin Long
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16820, United States
| | - Zhiyuan Cao
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, United States
- Program of Health Informatics, Yale University, New Haven, CT, 06520, United States
| | - Yuge Wang
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
| | - Chuan Hua He
- Department of Neurology, Yale University School of Medicine, New Haven, CT, 06520, United States
| | - Le Zhang
- Department of Neurology, Yale University School of Medicine, New Haven, CT, 06520, United States
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06520, United States
| | - Stephen M Strittmatter
- Department of Neurology, Yale University School of Medicine, New Haven, CT, 06520, United States
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06520, United States
- Cellular Neuroscience, Neurodegeneration and Repair Program, Yale University School of Medicine, New Haven, CT, 06520, United States
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, United States
| |
Collapse
|
16
|
Zhu Q, Li A, Zhang Z, Zheng C, Zhao J, Liu JX, Zhang D, Shao W. Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2543-2555. [PMID: 39471116 DOI: 10.1109/tcbb.2024.3487574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2024]
Abstract
Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.
Collapse
|
17
|
Kumari P, Kaur M, Dindhoria K, Ashford B, Amarasinghe SL, Thind AS. Advances in long-read single-cell transcriptomics. Hum Genet 2024; 143:1005-1020. [PMID: 38787419 PMCID: PMC11485027 DOI: 10.1007/s00439-024-02678-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/07/2024] [Indexed: 05/25/2024]
Abstract
Long-read single-cell transcriptomics (scRNA-Seq) is revolutionizing the way we profile heterogeneity in disease. Traditional short-read scRNA-Seq methods are limited in their ability to provide complete transcript coverage, resolve isoforms, and identify novel transcripts. The scRNA-Seq protocols developed for long-read sequencing platforms overcome these limitations by enabling the characterization of full-length transcripts. Long-read scRNA-Seq techniques initially suffered from comparatively poor accuracy compared to short read scRNA-Seq. However, with improvements in accuracy, accessibility, and cost efficiency, long-reads are gaining popularity in the field of scRNA-Seq. This review details the advances in long-read scRNA-Seq, with an emphasis on library preparation protocols and downstream bioinformatics analysis tools.
Collapse
Affiliation(s)
- Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Manmeet Kaur
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Bruce Ashford
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia
| | - Shanika L Amarasinghe
- Monash Biomedical Discovery Institute, Monash University, Clayton, VIC, 3800, Australia
- Walter and Eliza Hall Institute of Medical Research, 1G, Royal Parade, Parkville, VIC, 3025, Australia
| | - Amarinder Singh Thind
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia.
- The School of Chemistry and Molecular Bioscience (SCMB), University of Wollongong, Loftus St, Wollongong, NSW, 2500, Australia.
| |
Collapse
|
18
|
Bugno J, Wang L, Yu X, Cao X, Wang J, Huang X, Yang K, Piffko A, Chen K, Luo SY, Naccasha E, Hou Y, Fu S, He C, Fu YX, Liang HL, Weichselbaum RR. Targeting the Dendritic Cell-Secreted Immunoregulatory Cytokine CCL22 Alleviates Radioresistance. Clin Cancer Res 2024; 30:4450-4463. [PMID: 38691100 PMCID: PMC11444901 DOI: 10.1158/1078-0432.ccr-23-3616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/20/2024] [Accepted: 04/29/2024] [Indexed: 05/03/2024]
Abstract
PURPOSE Radiation-mediated immune suppression limits efficacy and is a barrier in cancer therapy. Radiation induces negative regulators of tumor immunity including regulatory T cells (Treg). Mechanisms underlying Treg infiltration after radiotherapy (RT) are poorly defined. Given that conventional dendritic cells (cDC) maintain Treg, we sought to identify and target cDC signaling to block Treg infiltration after radiation. EXPERIMENTAL DESIGN Transcriptomics and high dimensional flow cytometry revealed changes in murine tumor cDC that not only mediate Treg infiltration after RT but also associate with worse survival in human cancer datasets. Antibodies perturbing a cDC-CCL22-Treg axis were tested in syngeneic murine tumors. A prototype interferon-anti-epidermal growth factor receptor fusion protein (αEGFR-IFNα) was examined to block Treg infiltration and promote a CD8+ T cell response after RT. RESULTS Radiation expands a population of mature cDC1 enriched in immunoregulatory markers that mediates Treg infiltration via the Treg-recruiting chemokine CCL22. Blocking CCL22 or Treg depletion both enhanced RT efficacy. αEGFR-IFNα blocked cDC1 CCL22 production while simultaneously inducing an antitumor CD8+ T cell response to enhance RT efficacy in multiple EGFR-expressing murine tumor models, including following systemic administration. CONCLUSIONS We identify a previously unappreciated cDC mechanism mediating Treg tumor infiltration after RT. Our findings suggest blocking the cDC1-CCL22-Treg axis augments RT efficacy. αEGFR-IFNα added to RT provided robust antitumor responses better than systemic free interferon administration and may overcome clinical limitations to interferon therapy. Our findings highlight the complex behavior of cDC after RT and provide novel therapeutic strategies for overcoming RT-driven immunosuppression to improve RT efficacy. See related commentary by Kalinski et al., p. 4260.
Collapse
Affiliation(s)
- Jason Bugno
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
- Committee on Clinical Pharmacology and Pharmacogenomics, University of Chicago; Chicago, USA
| | - Liangliang Wang
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| | - Xianbin Yu
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, University of Chicago; Chicago, USA
- Howard Hughes Medical Institute, University of Chicago; Chicago, USA
| | - Xuezhi Cao
- Guangzhou National Laboratory, Bio-Island; Guangzhou, China
| | - Jiaai Wang
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| | - Xiaona Huang
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| | - Kaiting Yang
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| | - Andras Piffko
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
- Department of Neurosurgery, University Medical Center Hamburg-Eppendorf; Hamburg, Germany
| | - Katherine Chen
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| | - Stephen Y. Luo
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| | - Emile Naccasha
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| | - Yuzhu Hou
- Department of Pathogenic Microbiology and Immunology, School of Basic Medical Sciences, Xi’an Jiaotong University; Xi’an, China
| | - Sherry Fu
- UT Southwestern Medical School, University of Texas Southwestern Medical Center; Dallas, USA
| | - Chuan He
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, University of Chicago; Chicago, USA
- Howard Hughes Medical Institute, University of Chicago; Chicago, USA
| | - Yang-xin Fu
- Department of Basic Medical Science, Tsinghua University; Beijing, China
| | - Hua Laura Liang
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| | - Ralph R. Weichselbaum
- Department of Radiation and Cellular Oncology, University of Chicago; Chicago, USA
- Ludwig Center for Metastasis Research, University of Chicago; Chicago, USA
| |
Collapse
|
19
|
Xie Y, Yang J, Ouyang JF, Petretto E. scPanel: a tool for automatic identification of sparse gene panels for generalizable patient classification using scRNA-seq datasets. Brief Bioinform 2024; 25:bbae482. [PMID: 39350339 PMCID: PMC11442147 DOI: 10.1093/bib/bbae482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 08/30/2024] [Accepted: 09/12/2024] [Indexed: 10/04/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies can generate transcriptomic profiles at a single-cell resolution in large patient cohorts, facilitating discovery of gene and cellular biomarkers for disease. Yet, when the number of biomarker genes is large, the translation to clinical applications is challenging due to prohibitive sequencing costs. Here, we introduce scPanel, a computational framework designed to bridge the gap between biomarker discovery and clinical application by identifying a sparse gene panel for patient classification from the cell population(s) most responsive to perturbations (e.g. diseases/drugs). scPanel incorporates a data-driven way to automatically determine a minimal number of informative biomarker genes. Patient-level classification is achieved by aggregating the prediction probabilities of cells associated with a patient using the area under the curve score. Application of scPanel to scleroderma, colorectal cancer, and COVID-19 datasets resulted in high patient classification accuracy using only a small number of genes (<20), automatically selected from the entire transcriptome. In the COVID-19 case study, we demonstrated cross-dataset generalizability in predicting disease state in an external patient cohort. scPanel outperforms other state-of-the-art gene selection methods for patient classification and can be used to identify parsimonious sets of reliable biomarker candidates for clinical translation.
Collapse
Affiliation(s)
- Yi Xie
- Programme in Cardiovascular and Metabolic Disorders, Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
| | - Jianfei Yang
- The School of Mechanical and Aerospace Engineering and the School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore 639798, Singapore
| | - John F Ouyang
- Programme in Cardiovascular and Metabolic Disorders, Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
| | - Enrico Petretto
- Programme in Cardiovascular and Metabolic Disorders, Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
| |
Collapse
|
20
|
Andrade AF, Annett A, Karimi E, Topouza DG, Rezanejad M, Liu Y, McNicholas M, Gonzalez Santiago EG, Llivichuzhca-Loja D, Gehlhaar A, Jessa S, De Cola A, Chandarana B, Russo C, Faury D, Danieau G, Puligandla E, Wei Y, Zeinieh M, Wu Q, Hebert S, Juretic N, Nakada EM, Krug B, Larouche V, Weil AG, Dudley RWR, Karamchandani J, Agnihotri S, Quail DF, Ellezam B, Konnikova L, Walsh LA, Pathania M, Kleinman CL, Jabado N. Immune landscape of oncohistone-mutant gliomas reveals diverse myeloid populations and tumor-promoting function. Nat Commun 2024; 15:7769. [PMID: 39237515 PMCID: PMC11377583 DOI: 10.1038/s41467-024-52096-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Histone H3-mutant gliomas are deadly brain tumors characterized by a dysregulated epigenome and stalled differentiation. In contrast to the extensive datasets available on tumor cells, limited information exists on their tumor microenvironment (TME), particularly the immune infiltrate. Here, we characterize the immune TME of H3.3K27M and G34R/V-mutant gliomas, and multiple H3.3K27M mouse models, using transcriptomic, proteomic and spatial single-cell approaches. Resolution of immune lineages indicates high infiltration of H3-mutant gliomas with diverse myeloid populations, high-level expression of immune checkpoint markers, and scarce lymphoid cells, findings uniformly reproduced in all H3.3K27M mouse models tested. We show these myeloid populations communicate with H3-mutant cells, mediating immunosuppression and sustaining tumor formation and maintenance. Dual inhibition of myeloid cells and immune checkpoint pathways show significant therapeutic benefits in pre-clinical syngeneic mouse models. Our findings provide a valuable characterization of the TME of oncohistone-mutant gliomas, and insight into the means for modulating the myeloid infiltrate for the benefit of patients.
Collapse
Affiliation(s)
- Augusto Faria Andrade
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada
- The Research Institute of the McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
| | - Alva Annett
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada
| | - Elham Karimi
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, QC, H3A 1A3, Canada
| | | | - Morteza Rezanejad
- Departments of Psychology and Computer Science, University of Toronto, Toronto, ON, M5S 3G3, M5S 2E4, Canada
| | - Yitong Liu
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, QC, H3A 1A3, Canada
| | - Michael McNicholas
- Department of Oncology and The Milner Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, CB2 0AW, UK
- CRUK Children's Brain Tumour Centre of Excellence, University of Cambridge, Cambridge, E20 1JQ, UK
| | | | | | - Arne Gehlhaar
- Life and Medical Sciences Institute, University of Bonn, Bonn, 53115, Germany
| | - Selin Jessa
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
- Lady Davis Research Institute, Jewish General Hospital, Montreal, QC, H3T 1E2, Canada
| | - Antonella De Cola
- Department of Oncology and The Milner Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, CB2 0AW, UK
- CRUK Children's Brain Tumour Centre of Excellence, University of Cambridge, Cambridge, E20 1JQ, UK
| | - Bhavyaa Chandarana
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada
| | - Caterina Russo
- The Research Institute of the McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
- Department of Pediatrics, McGill University, Montreal, QC, H4A 3J1, Canada
| | - Damien Faury
- The Research Institute of the McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
- Department of Pediatrics, McGill University, Montreal, QC, H4A 3J1, Canada
| | - Geoffroy Danieau
- Cancer Research Program, The Research Institute of the McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
- Division of Orthopedic Surgery, McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
| | - Evan Puligandla
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada
| | - Yuhong Wei
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, QC, H3A 1A3, Canada
| | - Michele Zeinieh
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada
| | - Qing Wu
- The Research Institute of the McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
- Department of Pediatrics, McGill University, Montreal, QC, H4A 3J1, Canada
| | - Steven Hebert
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada
- Lady Davis Research Institute, Jewish General Hospital, Montreal, QC, H3T 1E2, Canada
| | - Nikoleta Juretic
- The Research Institute of the McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
- Department of Pediatrics, McGill University, Montreal, QC, H4A 3J1, Canada
| | - Emily M Nakada
- The Research Institute of the McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
- Department of Pediatrics, McGill University, Montreal, QC, H4A 3J1, Canada
| | - Brian Krug
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada
| | - Valerie Larouche
- Department of Pediatrics, Centre mère-enfant Soleil du CHU de Québec-Université Laval, Quebec City, QC, G1V 4G2, Canada
| | - Alexander G Weil
- Brain and Development Research Axis, Sainte-Justine Research Centre, Montreal, QC, H3T 1C5, Canada
- Division of Neurosurgery, Department of Surgery, Centre Hospitalier Universitaire Sainte-Justine, Université de Montréal, Montreal, QC, H3T 1C5, Canada
- Department of Neuroscience, University of Montreal, Montreal, QC, H2X 0A9, Canada
| | - Roy W R Dudley
- Department of Pediatric Surgery, Division of Neurosurgery, Montreal Children's Hospital, McGill University, Montreal, QC, H4A 3J1, Canada
| | - Jason Karamchandani
- Department of Pathology, Montreal Neurological Institute, McGill University, Montreal, QC, H3A 2B4, Canada
| | - Sameer Agnihotri
- Department of Neurological Surgery, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
| | - Daniela F Quail
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, QC, H3A 1A3, Canada
- Department of Physiology, Faculty of Medicine, McGill University, Montreal, QC, H3G 1Y6, Canada
- Division of Experimental Medicine, Department of Medicine, McGill University, Montreal, QC, H4A 3J1, Canada
| | - Benjamin Ellezam
- Division of Pathology, Department of Pathology and Cell Biology, CHU Sainte-Justine, Université de Montréal, Montreal, QC, H3T 1C5, Canada
| | - Liza Konnikova
- Department of Pediatrics, Yale School of Medicine, New Haven, CT, 06510, USA.
- Department of Obstetrics, Gynecology and Reproductive Sciences, Yale School of Medicine, New Haven, CT, 06510, USA.
- Human and Translational Immunology Program, Yale School of Medicine, New Haven, CT, 06510, USA.
| | - Logan A Walsh
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montreal, QC, H3A 1A3, Canada
| | - Manav Pathania
- Department of Oncology and The Milner Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, CB2 0AW, UK.
- CRUK Children's Brain Tumour Centre of Excellence, University of Cambridge, Cambridge, E20 1JQ, UK.
| | - Claudia L Kleinman
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada.
- Lady Davis Research Institute, Jewish General Hospital, Montreal, QC, H3T 1E2, Canada.
| | - Nada Jabado
- Department of Human Genetics, McGill University, Montreal, QC, H3A 0C7, Canada.
- The Research Institute of the McGill University Health Centre, Montreal, QC, H4A 3J1, Canada.
- Department of Pediatrics, McGill University, Montreal, QC, H4A 3J1, Canada.
- Division of Experimental Medicine, Department of Medicine, McGill University, Montreal, QC, H4A 3J1, Canada.
| |
Collapse
|
21
|
Lim SY, Lin Y, Lee JH, Pedersen B, Stewart A, Scolyer RA, Long GV, Yang JYH, Rizos H. Single-cell RNA sequencing reveals melanoma cell state-dependent heterogeneity of response to MAPK inhibitors. EBioMedicine 2024; 107:105308. [PMID: 39216232 PMCID: PMC11402938 DOI: 10.1016/j.ebiom.2024.105308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 08/11/2024] [Accepted: 08/12/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Melanoma is a heterogeneous cancer influenced by the plasticity of melanoma cells and their dynamic adaptations to microenvironmental cues. Melanoma cells transition between well-defined transcriptional cell states that impact treatment response and resistance. METHODS In this study, we applied single-cell RNA sequencing to interrogate the molecular features of immunotherapy-naive and immunotherapy-resistant melanoma tumours in response to ex vivo BRAF/MEK inhibitor treatment. FINDINGS We confirm the presence of four distinct melanoma cell states - melanocytic, transitory, neural-crest like and undifferentiated, and identify enrichment of neural crest-like and undifferentiated melanoma cells in immunotherapy-resistant tumours. Furthermore, we introduce an integrated computational approach to identify subsets of responding and nonresponding melanoma cells within the transcriptional cell states. INTERPRETATION Nonresponding melanoma cells are identified in all transcriptional cell states and are predisposed to BRAF/MEK inhibitor resistance due to pro-inflammatory IL6 and TNFɑ signalling. Our study provides a framework to study treatment response within distinct melanoma cell states and indicate that tumour-intrinsic pro-inflammatory signalling contributes to BRAF/MEK inhibitor resistance. FUNDING This work was supported by Macquarie University, Melanoma Institute Australia, and the National Health and Medical Research Council of Australia (NHMRC; grant 2012860, 2028055).
Collapse
Affiliation(s)
- Su Yin Lim
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia.
| | - Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, Australia; Charles Perkins Centre, The University of Sydney, Australia
| | - Jenny H Lee
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia; Department of Neurosurgery, Chris O'Brien Lifehouse, Sydney, NSW, Australia
| | - Bernadette Pedersen
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia
| | - Ashleigh Stewart
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia
| | - Richard A Scolyer
- Melanoma Institute Australia, Australia; Charles Perkins Centre, The University of Sydney, Australia; Tissue Pathology and Diagnostic Oncology, Royal Prince Alfred Hospital and NSW Health Pathology, Sydney, Australia; Faculty of Medicine and Health, The University of Sydney, Australia
| | - Georgina V Long
- Melanoma Institute Australia, Australia; Charles Perkins Centre, The University of Sydney, Australia; Royal North Shore and Mater Hospitals, Sydney, Australia; Faculty of Medicine and Health, The University of Sydney, Australia
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, Australia; Charles Perkins Centre, The University of Sydney, Australia
| | - Helen Rizos
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia
| |
Collapse
|
22
|
Zhao M, Li J, Liu X, Ma K, Tang J, Guo F. A gene regulatory network-aware graph learning method for cell identity annotation in single-cell RNA-seq data. Genome Res 2024; 34:1036-1051. [PMID: 39134412 PMCID: PMC11368180 DOI: 10.1101/gr.278439.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 07/23/2024] [Indexed: 08/22/2024]
Abstract
Cell identity annotation for single-cell transcriptome data is a crucial process for constructing cell atlases, unraveling pathogenesis, and inspiring therapeutic approaches. Currently, the efficacy of existing methodologies is contingent upon specific data sets. Nevertheless, such data are often sourced from various batches, sequencing technologies, tissues, and even species. Notably, the gene regulatory relationship remains unaffected by the aforementioned factors, highlighting the extensive gene interactions within organisms. Therefore, we propose scHGR, an automated annotation tool designed to leverage gene regulatory relationships in constructing gene-mediated cell communication graphs for single-cell transcriptome data. This strategy helps reduce noise from diverse data sources while establishing distant cellular connections, yielding valuable biological insights. Experiments involving 22 scenarios demonstrate that scHGR precisely and consistently annotates cell identities, benchmarked against state-of-the-art methods. Crucially, scHGR uncovers novel subtypes within peripheral blood mononuclear cells, specifically from CD4+ T cells and cytotoxic T cells. Furthermore, by characterizing a cell atlas comprising 56 cell types for COVID-19 patients, scHGR identifies vital factors like IL1 and calcium ions, offering insights for targeted therapeutic interventions.
Collapse
Affiliation(s)
- Mengyuan Zhao
- College of Computer Science and Control Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100190, China
| | - Jiawei Li
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Xiaoyi Liu
- Computer Science and Engineering, University of South Carolina, Columbia, South Carolina 29208, USA
| | - Ke Ma
- College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Jijun Tang
- College of Computer Science and Control Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
23
|
Cao X, Huang YA, You ZH, Shang X, Hu L, Hu PW, Huang ZA. scPriorGraph: constructing biosemantic cell-cell graphs with prior gene set selection for cell type identification from scRNA-seq data. Genome Biol 2024; 25:207. [PMID: 39103856 DOI: 10.1186/s13059-024-03357-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 07/29/2024] [Indexed: 08/07/2024] Open
Abstract
Cell type identification is an indispensable analytical step in single-cell data analyses. To address the high noise stemming from gene expression data, existing computational methods often overlook the biologically meaningful relationships between genes, opting to reduce all genes to a unified data space. We assume that such relationships can aid in characterizing cell type features and improving cell type recognition accuracy. To this end, we introduce scPriorGraph, a dual-channel graph neural network that integrates multi-level gene biosemantics. Experimental results demonstrate that scPriorGraph effectively aggregates feature values of similar cells using high-quality graphs, achieving state-of-the-art performance in cell type identification.
Collapse
Affiliation(s)
- Xiyue Cao
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Peng-Wei Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhi-An Huang
- Research Office, City University of Hong Kong (Dongguan), Dongguan, 523000, China
| |
Collapse
|
24
|
Biswas B, Kumar N, Sugimoto M, Hoque MA. scHD4E: Novel ensemble learning-based differential expression analysis method for single-cell RNA-sequencing data. Comput Biol Med 2024; 178:108769. [PMID: 38897145 DOI: 10.1016/j.compbiomed.2024.108769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/14/2024] [Accepted: 06/15/2024] [Indexed: 06/21/2024]
Abstract
Differential expression (DE) analysis between cell types for scRNA-seq data by capturing its complicated features is crucial. Recently, different methods have been developed for targeting the scRNA-seq data analysis based on different modeling frameworks, assumptions, strategies and test statistic in considering various data features. The scDEA is an ensemble learning-based DE analysis method developed recently, yielding p-values using Lancaster's combination, generated by 12 individual DE analysis methods, and producing more accurate and stable results than individual methods. The objective of our study is to propose a new ensemble learning-based DE analysis method, scHD4E, using top performers in only 4 separate methods. The top performer 4 methods have been selected through an evaluation process using six real scRNA-seq data sets. We conducted comprehensive experiments for five experimental data sets to evaluate our proposed method based on the sample size effects, batch effects, type I error control, gene ontology enrichment analysis, runtime, identified matched DE genes, and semantic similarity measurement between methods. We also perform similar analyses (except the last 3 terms) and compute performance measures like accuracy, F1 score, Mathew's correlation coefficient etc. for a simulated data set. The results show that scHD4E is performs better than all the individual and scDEA methods in all the above perspectives. We expect that scHD4E will serve the modern data scientists for detecting the DEGs in scRNA-seq data analysis. To implement our proposed method, a Github R package scHD4E and its shiny application has been developed, and available in the following links: https://github.com/bbiswas1989/scHD4E and https://github.com/bbiswas1989/scHD4E-Shiny.
Collapse
Affiliation(s)
- Biplab Biswas
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh; Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Nishith Kumar
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh.
| | - Masahiro Sugimoto
- Institute for Advanced Biosciences, Keio University 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan.
| | - Md Aminul Hoque
- Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
25
|
Theunissen L, Mortier T, Saeys Y, Waegeman W. Uncertainty-aware single-cell annotation with a hierarchical reject option. Bioinformatics 2024; 40:btae128. [PMID: 38441258 PMCID: PMC10957513 DOI: 10.1093/bioinformatics/btae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 02/23/2024] [Accepted: 03/01/2024] [Indexed: 03/23/2024] Open
Abstract
MOTIVATION Automatic cell type annotation methods assign cell type labels to new datasets by extracting relationships from a reference RNA-seq dataset. However, due to the limited resolution of gene expression features, there is always uncertainty present in the label assignment. To enhance the reliability and robustness of annotation, most machine learning methods address this uncertainty by providing a full reject option, i.e. when the predicted confidence score of a cell type label falls below a user-defined threshold, no label is assigned and no prediction is made. As a better alternative, some methods deploy hierarchical models and consider a so-called partial rejection by returning internal nodes of the hierarchy as label assignment. However, because a detailed experimental analysis of various rejection approaches is missing in the literature, there is currently no consensus on best practices. RESULTS We evaluate three annotation approaches (i) full rejection, (ii) partial rejection, and (iii) no rejection for both flat and hierarchical probabilistic classifiers. Our findings indicate that hierarchical classifiers are superior when rejection is applied, with partial rejection being the preferred rejection approach, as it preserves a significant amount of label information. For optimal rejection implementation, the rejection threshold should be determined through careful examination of a method's rejection behavior. Without rejection, flat and hierarchical annotation perform equally well, as long as the cell type hierarchy accurately captures transcriptomic relationships. AVAILABILITY AND IMPLEMENTATION Code is freely available at https://github.com/Latheuni/Hierarchical_reject and https://doi.org/10.5281/zenodo.10697468.
Collapse
Affiliation(s)
- Lauren Theunissen
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Thomas Mortier
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Willem Waegeman
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| |
Collapse
|
26
|
Choi JM, Park C, Chae H. moSCminer: a cell subtype classification framework based on the attention neural network integrating the single-cell multi-omics dataset on the cloud. PeerJ 2024; 12:e17006. [PMID: 38426141 PMCID: PMC10903350 DOI: 10.7717/peerj.17006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/05/2024] [Indexed: 03/02/2024] Open
Abstract
Single-cell omics sequencing has rapidly advanced, enabling the quantification of diverse omics profiles at a single-cell resolution. To facilitate comprehensive biological insights, such as cellular differentiation trajectories, precise annotation of cell subtypes is essential. Conventional methods involve clustering cells and manually assigning subtypes based on canonical markers, a labor-intensive and expert-dependent process. Hence, an automated computational prediction framework is crucial. While several classification frameworks for predicting cell subtypes from single-cell RNA sequencing datasets exist, these methods solely rely on single-omics data, offering insights at a single molecular level. They often miss inter-omic correlations and a holistic understanding of cellular processes. To address this, the integration of multi-omics datasets from individual cells is essential for accurate subtype annotation. This article introduces moSCminer, a novel framework for classifying cell subtypes that harnesses the power of single-cell multi-omics sequencing datasets through an attention-based neural network operating at the omics level. By integrating three distinct omics datasets-gene expression, DNA methylation, and DNA accessibility-while accounting for their biological relationships, moSCminer excels at learning the relative significance of each omics feature. It then transforms this knowledge into a novel representation for cell subtype classification. Comparative evaluations against standard machine learning-based classifiers demonstrate moSCminer's superior performance, consistently achieving the highest average performance on real datasets. The efficacy of multi-omics integration is further corroborated through an in-depth analysis of the omics-level attention module, which identifies potential markers for cell subtype annotation. To enhance accessibility and scalability, moSCminer is accessible as a user-friendly web-based platform seamlessly connected to a cloud system, publicly accessible at http://203.252.206.118:5568. Notably, this study marks the pioneering integration of three single-cell multi-omics datasets for cell subtype identification.
Collapse
Affiliation(s)
- Joung Min Choi
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, Virginia, United States
| | - Chaelin Park
- Division of Computer Science, Sookmyung Women’s University, Seoul, South Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women’s University, Seoul, South Korea
| |
Collapse
|
27
|
Ali M, Yang T, He H, Zhang Y. Plant biotechnology research with single-cell transcriptome: recent advancements and prospects. PLANT CELL REPORTS 2024; 43:75. [PMID: 38381195 DOI: 10.1007/s00299-024-03168-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/05/2024] [Indexed: 02/22/2024]
Abstract
KEY MESSAGE Single-cell transcriptomic techniques have emerged as powerful tools in plant biology, offering high-resolution insights into gene expression at the individual cell level. This review highlights the rapid expansion of single-cell technologies in plants, their potential in understanding plant development, and their role in advancing plant biotechnology research. Single-cell techniques have emerged as powerful tools to enhance our understanding of biological systems, providing high-resolution transcriptomic analysis at the single-cell level. In plant biology, the adoption of single-cell transcriptomics has seen rapid expansion of available technologies and applications. This review article focuses on the latest advancements in the field of single-cell transcriptomic in plants and discusses the potential role of these approaches in plant development and expediting plant biotechnology research in the near future. Furthermore, inherent challenges and limitations of single-cell technology are critically examined to overcome them and enhance our knowledge and understanding.
Collapse
Affiliation(s)
- Muhammad Ali
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
- Peking University-Institute of Advanced Agricultural Sciences, Weifang, China
| | - Tianxia Yang
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing, China
| | - Hai He
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
| | - Yu Zhang
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China.
| |
Collapse
|
28
|
Lin Y, Wu TY, Chen X, Wan S, Chao B, Xin J, Yang JYH, Wong WH, Wang YXR. Data integration and inference of gene regulation using single-cell temporal multimodal data with scTIE. Genome Res 2024; 34:119-133. [PMID: 38190633 PMCID: PMC10903952 DOI: 10.1101/gr.277960.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 12/13/2023] [Indexed: 01/10/2024]
Abstract
Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.
Collapse
Affiliation(s)
- Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR 999077, China
| | - Tung-Yu Wu
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Xi Chen
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Sheng Wan
- Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
| | - Brian Chao
- Department of Electrical Engineering, Stanford University, Stanford, California 94305-9505, USA
| | - Jingxue Xin
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR 999077, China
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA;
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305-5464, USA
- Bio-X Program, Stanford University, Stanford, California 94305, USA
| | - Y X Rachel Wang
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia;
| |
Collapse
|
29
|
Wang X, Chai Z, Li S, Liu Y, Li C, Jiang Y, Liu Q. CTISL: a dynamic stacking multi-class classification approach for identifying cell types from single-cell RNA-seq data. Bioinformatics 2024; 40:btae063. [PMID: 38317054 PMCID: PMC10873586 DOI: 10.1093/bioinformatics/btae063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 02/07/2024] Open
Abstract
MOTIVATION Effective identification of cell types is of critical importance in single-cell RNA-sequencing (scRNA-seq) data analysis. To date, many supervised machine learning-based predictors have been implemented to identify cell types from scRNA-seq datasets. Despite the technical advances of these state-of-the-art tools, most existing predictors were single classifiers, of which the performances can still be significantly improved. It is therefore highly desirable to employ the ensemble learning strategy to develop more accurate computational models for robust and comprehensive identification of cell types on scRNA-seq datasets. RESULTS We propose a two-layer stacking model, termed CTISL (Cell Type Identification by Stacking ensemble Learning), which integrates multiple classifiers to identify cell types. In the first layer, given a reference scRNA-seq dataset with known cell types, CTISL dynamically combines multiple cell-type-specific classifiers (i.e. support-vector machine and logistic regression) as the base learners to deliver the outcomes for the input of a meta-classifier in the second layer. We conducted a total of 24 benchmarking experiments on 17 human and mouse scRNA-seq datasets to evaluate and compare the prediction performance of CTISL and other state-of-the-art predictors. The experiment results demonstrate that CTISL achieves superior or competitive performance compared to these state-of-the-art approaches. We anticipate that CTISL can serve as a useful and reliable tool for cost-effective identification of cell types from scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION The webserver and source code are freely available at http://bigdata.biocie.cn/CTISLweb/home and https://zenodo.org/records/10568906, respectively.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Ziyi Chai
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Shaohua Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Chen Li
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yu Jiang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
- Shaanxi Engineering Research Center of Agricultural Information Intelligent Perception and Analysis, Northwest A&F University, Yangling 712100, China
| |
Collapse
|
30
|
Fu X, Lin Y, Lin DM, Mechtersheimer D, Wang C, Ameen F, Ghazanfar S, Patrick E, Kim J, Yang JYH. BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data. Nat Commun 2024; 15:509. [PMID: 38218939 PMCID: PMC10787788 DOI: 10.1038/s41467-023-44560-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024] Open
Abstract
Recent advances in subcellular imaging transcriptomics platforms have enabled high-resolution spatial mapping of gene expression, while also introducing significant analytical challenges in accurately identifying cells and assigning transcripts. Existing methods grapple with cell segmentation, frequently leading to fragmented cells or oversized cells that capture contaminated expression. To this end, we present BIDCell, a self-supervised deep learning-based framework with biologically-informed loss functions that learn relationships between spatially resolved gene expression and cell morphology. BIDCell incorporates cell-type data, including single-cell transcriptomics data from public repositories, with cell morphology information. Using a comprehensive evaluation framework consisting of metrics in five complementary categories for cell segmentation performance, we demonstrate that BIDCell outperforms other state-of-the-art methods according to many metrics across a variety of tissue types and technology platforms. Our findings underscore the potential of BIDCell to significantly enhance single-cell spatial expression analyses, enabling great potential in biological discovery.
Collapse
Affiliation(s)
- Xiaohang Fu
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Computer Science, The University of Sydney, Sydney, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - David M Lin
- Department of Biomedical Sciences, Cornell University, Ithaca, NY, 14850, USA
| | - Daniel Mechtersheimer
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Chuhan Wang
- School of Computer Science, The University of Sydney, Sydney, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Farhan Ameen
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Shila Ghazanfar
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Ellis Patrick
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- The Westmead Institute for Medical Research, Sydney, NSW, 2145, Australia
| | - Jinman Kim
- School of Computer Science, The University of Sydney, Sydney, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Sydney, NSW, 2006, Australia.
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China.
| |
Collapse
|
31
|
Cao Y, Tran A, Kim H, Robertson N, Lin Y, Torkel M, Yang P, Patrick E, Ghazanfar S, Yang J. Thinking process templates for constructing data stories with SCDNEY. F1000Res 2023; 12:261. [PMID: 38434622 PMCID: PMC10905113 DOI: 10.12688/f1000research.130623.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/08/2023] [Indexed: 03/05/2024] Open
Abstract
Background Globally, scientists now have the ability to generate a vast amount of high throughput biomedical data that carry critical information for important clinical and public health applications. This data revolution in biology is now creating a plethora of new single-cell datasets. Concurrently, there have been significant methodological advances in single-cell research. Integrating these two resources, creating tailor-made, efficient, and purpose-specific data analysis approaches can assist in accelerating scientific discovery. Methods We developed a series of living workshops for building data stories, using Single-cell data integrative analysis (scdney). scdney is a wrapper package with a collection of single-cell analysis R packages incorporating data integration, cell type annotation, higher order testing and more. Results Here, we illustrate two specific workshops. The first workshop examines how to characterise the identity and/or state of cells and the relationship between them, known as phenotyping. The second workshop focuses on extracting higher-order features from cells to predict disease progression. Conclusions Through these workshops, we not only showcase current solutions, but also highlight critical thinking points. In particular, we highlight the Thinking Process Template that provides a structured framework for the decision-making process behind such single-cell analyses. Furthermore, our workshop will incorporate dynamic contributions from the community in a collaborative learning approach, thus the term 'living'.
Collapse
Affiliation(s)
- Yue Cao
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Andy Tran
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Hani Kim
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Children's Medical Research Institute, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Nick Robertson
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Yingxin Lin
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Marni Torkel
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Pengyi Yang
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Children's Medical Research Institute, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Ellis Patrick
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Shila Ghazanfar
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Jean Yang
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| |
Collapse
|
32
|
Wang L, Si W, Yu X, Piffko A, Dou X, Ding X, Bugno J, Yang K, Wen C, Zhang L, Chen D, Huang X, Wang J, Arina A, Pitroda S, Chmura SJ, He C, Liang HL, Weichselbaum R. Epitranscriptional regulation of TGF-β pseudoreceptor BAMBI by m6A/YTHDF2 drives extrinsic radioresistance. J Clin Invest 2023; 133:e172919. [PMID: 38099498 PMCID: PMC10721150 DOI: 10.1172/jci172919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 09/28/2023] [Indexed: 12/18/2023] Open
Abstract
Activation of TGF-β signaling serves as an extrinsic resistance mechanism that limits the potential for radiotherapy. Bone morphogenetic protein and activin membrane-bound inhibitor (BAMBI) antagonizes TGF-β signaling and is implicated in cancer progression. However, the molecular mechanisms of BAMBI regulation in immune cells and its impact on antitumor immunity after radiation have not been established. Here, we show that ionizing radiation (IR) specifically reduces BAMBI expression in immunosuppressive myeloid-derived suppressor cells (MDSCs) in both murine models and humans. Mechanistically, YTH N6-methyladenosine RNA-binding protein F2 (YTHDF2) directly binds and degrades Bambi transcripts in an N6-methyladenosine-dependent (m6A-dependent) manner, and this relies on NF-κB signaling. BAMBI suppresses the tumor-infiltrating capacity and suppression function of MDSCs via inhibiting TGF-β signaling. Adeno-associated viral delivery of Bambi (AAV-Bambi) to the tumor microenvironment boosts the antitumor effects of radiotherapy and radioimmunotherapy combinations. Intriguingly, combination of AAV-Bambi and IR not only improves local tumor control, but also suppresses distant metastasis, further supporting its clinical translation potential. Our findings uncover a surprising role of BAMBI in myeloid cells, unveiling a potential therapeutic strategy for overcoming extrinsic radioresistance.
Collapse
Affiliation(s)
- Liangliang Wang
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | - Wei Si
- State Key Laboratory of Animal Nutrition, Institute of Animal Sciences of Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xianbin Yu
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois, USA
| | - Andras Piffko
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
- Department of Neurosurgery, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Xiaoyang Dou
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois, USA
| | - Xingchen Ding
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
| | - Jason Bugno
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
- The Committee on Clinical Pharmacology and Pharmacogenomics and
| | - Kaiting Yang
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | - Chuangyu Wen
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | - Linda Zhang
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois, USA
| | - Dapeng Chen
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | - Xiaona Huang
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | - Jiaai Wang
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | - Ainhoa Arina
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | - Sean Pitroda
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | | | - Chuan He
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois, USA
- Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, USA
| | - Hua Laura Liang
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| | - Ralph Weichselbaum
- Department of Radiation and Cellular Oncology and
- Ludwig Center for Metastasis Research, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
33
|
Ghaddar B, De S. Hierarchical and automated cell-type annotation and inference of cancer cell of origin with Census. Bioinformatics 2023; 39:btad714. [PMID: 38011649 PMCID: PMC10713118 DOI: 10.1093/bioinformatics/btad714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 10/26/2023] [Accepted: 11/25/2023] [Indexed: 11/29/2023] Open
Abstract
MOTIVATION Cell-type annotation is a time-consuming yet critical first step in the analysis of single-cell RNA-seq data, especially when multiple similar cell subtypes with overlapping marker genes are present. Existing automated annotation methods have a number of limitations, including requiring large reference datasets, high computation time, shallow annotation resolution, and difficulty in identifying cancer cells or their most likely cell of origin. RESULTS We developed Census, a biologically intuitive and fully automated cell-type identification method for single-cell RNA-seq data that can deeply annotate normal cells in mammalian tissues and identify malignant cells and their likely cell of origin. Motivated by the inherently stratified developmental programs of cellular differentiation, Census infers hierarchical cell-type relationships and uses gradient-boosted \decision trees that capitalize on nodal cell-type relationships to achieve high prediction speed and accuracy. When benchmarked on 44 atlas-scale normal and cancer, human and mouse tissues, Census significantly outperforms state-of-the-art methods across multiple metrics and naturally predicts the cell-of-origin of different cancers. Census is pretrained on the Tabula Sapiens to classify 175 cell-types from 24 organs; however, users can seamlessly train their own models for customized applications. AVAILABILITY AND IMPLEMENTATION Census is available at Zenodo https://zenodo.org/records/7017103 and on our Github https://github.com/sjdlabgroup/Census.
Collapse
Affiliation(s)
- Bassel Ghaddar
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| | - Subhajyoti De
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| |
Collapse
|
34
|
Sadria M, Layton A, Bader GD. Adversarial training improves model interpretability in single-cell RNA-seq analysis. BIOINFORMATICS ADVANCES 2023; 3:vbad166. [PMID: 38099262 PMCID: PMC10719216 DOI: 10.1093/bioadv/vbad166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/28/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023]
Abstract
Motivation Predictive computational models must be accurate, robust, and interpretable to be considered reliable in important areas such as biology and medicine. A sufficiently robust model should not have its output affected significantly by a slight change in the input. Also, these models should be able to explain how a decision is made to support user trust in the results. Efforts have been made to improve the robustness and interpretability of predictive computational models independently; however, the interaction of robustness and interpretability is poorly understood. Results As an example task, we explore the computational prediction of cell type based on single-cell RNA-seq data and show that it can be made more robust by adversarially training a deep learning model. Surprisingly, we find this also leads to improved model interpretability, as measured by identifying genes important for classification using a range of standard interpretability methods. Our results suggest that adversarial training may be generally useful to improve deep learning robustness and interpretability and that it should be evaluated on a range of tasks. Availability and implementation Our Python implementation of all analysis in this publication can be found at: https://github.com/MehrshadSD/robustness-interpretability. The analysis was conducted using numPy 0.2.5, pandas 2.0.3, scanpy 1.9.3, tensorflow 2.10.0, matplotlib 3.7.1, seaborn 0.12.2, sklearn 1.1.1, shap 0.42.0, lime 0.2.0.1, matplotlib_venn 0.11.9.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Anita Layton
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Department of Biology, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- School of Pharmacy, University of Waterloo, Waterloo, Ontario N2G 1C5, Canada
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- The Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario M5G 1X5, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada
| |
Collapse
|
35
|
Quan F, Liang X, Cheng M, Yang H, Liu K, He S, Sun S, Deng M, He Y, Liu W, Wang S, Zhao S, Deng L, Hou X, Zhang X, Xiao Y. Annotation of cell types (ACT): a convenient web server for cell type annotation. Genome Med 2023; 15:91. [PMID: 37924118 PMCID: PMC10623726 DOI: 10.1186/s13073-023-01249-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/18/2023] [Indexed: 11/06/2023] Open
Abstract
BACKGROUND The advancement of single-cell sequencing has progressed our ability to solve biological questions. Cell type annotation is of vital importance to this process, allowing for the analysis and interpretation of enormous single-cell datasets. At present, however, manual cell annotation which is the predominant approach remains limited by both speed and the requirement of expert knowledge. METHODS To address these challenges, we constructed a hierarchically organized marker map through manually curating over 26,000 cell marker entries from about 7000 publications. We then developed WISE, a weighted and integrated gene set enrichment method, to integrate the prevalence of canonical markers and ordered differentially expressed genes of specific cell types in the marker map. Benchmarking analysis suggested that our method outperformed state-of-the-art methods. RESULTS By integrating the marker map and WISE, we developed a user-friendly and convenient web server, ACT ( http://xteam.xbio.top/ACT/ or http://biocc.hrbmu.edu.cn/ACT/ ), which only takes a simple list of upregulated genes as input and provides interactive hierarchy maps, together with well-designed charts and statistical information, to accelerate the assignment of cell identities and made the results comparable to expert manual annotation. Besides, a pan-tissue marker map was constructed to assist in cell assignments in less-studied tissues. Applying ACT to three case studies showed that all cell clusters were quickly and accurately annotated, and multi-level and more refined cell types were identified. CONCLUSIONS We developed a knowledge-based resource and a corresponding method, together with an intuitive graphical web interface, for cell type annotation. We believe that ACT, emerging as a powerful tool for cell type annotation, would be widely used in single-cell research and considerably accelerate the process of cell type identification.
Collapse
Affiliation(s)
- Fei Quan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Xin Liang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Mingjiang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Huan Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Kun Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shengyuan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shangqin Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Menglan Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Yanzhen He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Wei Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shuai Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shuxiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Lantian Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Xiaobo Hou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Xinxin Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.
| | - Yun Xiao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.
| |
Collapse
|
36
|
Baldwin M, Buckley CD, Guilak F, Hulley P, Cribbs AP, Snelling S. A roadmap for delivering a human musculoskeletal cell atlas. Nat Rev Rheumatol 2023; 19:738-752. [PMID: 37798481 DOI: 10.1038/s41584-023-01031-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/31/2023] [Indexed: 10/07/2023]
Abstract
Advances in single-cell technologies have transformed the ability to identify the individual cell types present within tissues and organs. The musculoskeletal bionetwork, part of the wider Human Cell Atlas project, aims to create a detailed map of the healthy musculoskeletal system at a single-cell resolution throughout tissue development and across the human lifespan, with complementary generation of data from diseased tissues. Given the prevalence of musculoskeletal disorders, this detailed reference dataset will be critical to understanding normal musculoskeletal function in growth, homeostasis and ageing. The endeavour will also help to identify the cellular basis for disease and lay the foundations for novel therapeutic approaches to treating diseases of the joints, soft tissues and bone. Here, we present a Roadmap delineating the critical steps required to construct the first draft of a human musculoskeletal cell atlas. We describe the key challenges involved in mapping the extracellular matrix-rich, but cell-poor, tissues of the musculoskeletal system, outline early milestones that have been achieved and describe the vision and directions for a comprehensive musculoskeletal cell atlas. By embracing cutting-edge technologies, integrating diverse datasets and fostering international collaborations, this endeavour has the potential to drive transformative changes in musculoskeletal medicine.
Collapse
Affiliation(s)
- Mathew Baldwin
- The Botnar Institute for Musculoskeletal Sciences, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Science, University of Oxford, Oxford, UK
| | - Christopher D Buckley
- The Kennedy Institute of Rheumatology, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Science, University of Oxford, Oxford, UK
| | - Farshid Guilak
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO, USA
- Shriners Hospitals for Children, St. Louis, MO, USA
| | - Philippa Hulley
- The Botnar Institute for Musculoskeletal Sciences, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Science, University of Oxford, Oxford, UK
| | - Adam P Cribbs
- The Botnar Institute for Musculoskeletal Sciences, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Science, University of Oxford, Oxford, UK
| | - Sarah Snelling
- The Botnar Institute for Musculoskeletal Sciences, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Science, University of Oxford, Oxford, UK.
| |
Collapse
|
37
|
Ng GYL, Tan SC, Ong CS. On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data. PLoS One 2023; 18:e0292961. [PMID: 37856458 PMCID: PMC10586655 DOI: 10.1371/journal.pone.0292961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 10/03/2023] [Indexed: 10/21/2023] Open
Abstract
Cell type identification is one of the fundamental tasks in single-cell RNA sequencing (scRNA-seq) studies. It is a key step to facilitate downstream interpretations such as differential expression, trajectory inference, etc. scRNA-seq data contains technical variations that could affect the interpretation of the cell types. Therefore, gene selection, also known as feature selection in data science, plays an important role in selecting informative genes for scRNA-seq cell type identification. Generally speaking, feature selection methods are categorized into filter-, wrapper-, and embedded-based approaches. From the existing literature, methods from filter- and embedded-based approaches are widely applied in scRNA-seq gene selection tasks. The wrapper-based method that gives promising results in other fields has yet been extensively utilized for selecting gene features from scRNA-seq data; in addition, most of the existing wrapper methods used in this field are clustering instead of classification-based. With a large number of annotated data available today, this study applied a classification-based approach as an alternative to the clustering-based wrapper method. In our work, a quantum-inspired differential evolution (QDE) wrapped with a classification method was introduced to select a subset of genes from twelve well-known scRNA-seq transcriptomic datasets to identify cell types. In particular, the QDE was combined with different machine-learning (ML) classifiers namely logistic regression, decision tree, support vector machine (SVM) with linear and radial basis function kernels, as well as extreme learning machine. The linear SVM wrapped with QDE, namely QDE-SVM, was chosen by referring to the feature selection results from the experiment. QDE-SVM showed a superior cell type classification performance among QDE wrapping with other ML classifiers as well as the recent wrapper methods (i.e., FSCAM, SSD-LAHC, MA-HS, and BSF). QDE-SVM achieved an average accuracy of 0.9559, while the other wrapper methods achieved average accuracies in the range of 0.8292 to 0.8872.
Collapse
Affiliation(s)
- Grace Yee Lin Ng
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, Malaysia
| | - Shing Chiang Tan
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, Malaysia
| | - Chia Sui Ong
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, Malaysia
| |
Collapse
|
38
|
Lazaros K, Vlamos P, Vrahatis AG. Methods for cell-type annotation on scRNA-seq data: A recent overview. J Bioinform Comput Biol 2023; 21:2340002. [PMID: 37743364 DOI: 10.1142/s0219720023400024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
The evolution of single-cell technology is ongoing, continually generating massive amounts of data that reveal many mysteries surrounding intricate diseases. However, their drawbacks continue to constrain us. Among these, annotating cell types in single-cell gene expressions pose a substantial challenge, despite the myriad of tools at our disposal. The rapid growth in data, resources, and tools has consequently brought about significant alterations in this area over the years. In our study, we spotlight all note-worthy cell type annotation techniques developed over the past four years. We provide an overview of the latest trends in this field, showcasing the most advanced methods in taxonomy. Our research underscores the demand for additional tools that incorporate a biological context and also predicts that the rising trend of graph neural network approaches will likely lead this research field in the coming years.
Collapse
Affiliation(s)
- Konstantinos Lazaros
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| | - Panagiotis Vlamos
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| | - Aristidis G Vrahatis
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
| |
Collapse
|
39
|
Fiannaca A, La Rosa M, La Paglia L, Gaglio S, Urso A. GOWDL: gene ontology-driven wide and deep learning model for cell typing of scRNA-seq data. Brief Bioinform 2023; 24:bbad332. [PMID: 37756593 PMCID: PMC10530315 DOI: 10.1093/bib/bbad332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/17/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) allows for obtaining genomic and transcriptomic profiles of individual cells. That data make it possible to characterize tissues at the cell level. In this context, one of the main analyses exploiting scRNA-seq data is identifying the cell types within tissue to estimate the quantitative composition of cell populations. Due to the massive amount of available scRNA-seq data, automatic classification approaches for cell typing, based on the most recent deep learning technology, are needed. Here, we present the gene ontology-driven wide and deep learning (GOWDL) model for classifying cell types in several tissues. GOWDL implements a hybrid architecture that considers the functional annotations found in Gene Ontology and the marker genes typical of specific cell types. We performed cross-validation and independent external testing, comparing our algorithm with 12 other state-of-the-art predictors. Classification scores demonstrated that GOWDL reached the best results over five different tissues, except for recall, where we got about 92% versus 97% of the best tool. Finally, we presented a case study on classifying immune cell populations in breast cancer using a hierarchical approach based on GOWDL.
Collapse
Affiliation(s)
- Antonino Fiannaca
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Massimo La Rosa
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Laura La Paglia
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Salvatore Gaglio
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
- Dipartimento di Ingegneria, Università degli studi di Palermo, Viale Delle Scienze, ed. 6, 90128, Palermo, Italy
| | - Alfonso Urso
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| |
Collapse
|
40
|
Yang T, Yan C, Yang L, Tan J, Jiang S, Hu J, Gao W, Wang Q, Li Y. Identification and validation of core genes for type 2 diabetes mellitus by integrated analysis of single-cell and bulk RNA-sequencing. Eur J Med Res 2023; 28:340. [PMID: 37700362 PMCID: PMC10498638 DOI: 10.1186/s40001-023-01321-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/27/2023] [Indexed: 09/14/2023] Open
Abstract
BACKGROUND The exact mechanisms of type 2 diabetes mellitus (T2DM) remain largely unknown. We intended to authenticate critical genes linked to T2DM progression by tandem single-cell sequencing and general transcriptome sequencing data. METHODS T2DM single-cell RNA-sequencing data were submitted by the Gene Expression Omnibus (GEO) database and ArrayExpress (EBI), from which gene expression matrices were retrieved. The common cell clusters and representative marker genes were ascertained by principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), CellMarker, and FindMarkers in two datasets (GSE86469 and GSE81608). T2DM-related differentially expressed marker genes were defined by intersection analysis of marker genes and GSE86468-differentially expressed genes. Receiver operating characteristic (ROC) curves were utilized to assign representative marker genes with diagnostic values by GSE86468, GSE29226 and external validation GSE29221, and their prospective target compounds were forecasted by PubChem. Besides, the R package clusterProfiler-based functional annotation was designed to unveil the intrinsic mechanisms of the target genes. At last, western blot was used to validate the alternation of CDKN1C and DLK1 expression in primary pancreatic islet cells cultured with or without 30mM glucose. RESULTS Three common cell clusters were authenticated in two independent T2DM single-cell sequencing data, covering neurons, epithelial cells, and smooth muscle cells. Functional ensemble analysis disclosed an intimate association of these cell clusters with peptide/insulin secretion and pancreatic development. Pseudo-temporal trajectory analysis indicated that almost all epithelial and smooth muscle cells were of neuron origin. We characterized CDKN1C and DLK1, which were notably upregulated in T2DM samples, with satisfactory availability in recognizing three representative marker genes in non-diabetic and T2DM samples, and they were also robustly interlinked with the clinical characteristics of patients. Western blot also demonstrated that, compared with control group, the expression of CDKN1C and DLK1 were increased in primary pancreatic islet cells cultured with 30 mM glucose for 48 h. Additionally, PubChem projected 11 and 21 potential compounds for CDKN1C and DLK1, respectively. CONCLUSION It is desirable that the emergence of the 2 critical genes indicated (CDKN1C and DLK1) could be catalysts for the investigation of the mechanisms of T2DM progression and the exploitation of innovative therapies.
Collapse
Affiliation(s)
- Tingting Yang
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Chaoying Yan
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Lan Yang
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Jialu Tan
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Shiqiu Jiang
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Juan Hu
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
- Shaanxi University of Chinese Medicine, Xianyang, Shaanxi, China
| | - Wei Gao
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China
| | - Qiang Wang
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China.
| | - Yansong Li
- Department of Anesthesiology & Center for Brain Science, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, Shaanxi, China.
| |
Collapse
|
41
|
Lyu P, Zhai Y, Li T, Qian J. CellAnn: a comprehensive, super-fast, and user-friendly single-cell annotation web server. Bioinformatics 2023; 39:btad521. [PMID: 37610325 PMCID: PMC10477937 DOI: 10.1093/bioinformatics/btad521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 07/17/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023] Open
Abstract
MOTIVATION Single-cell sequencing technology has become a routine in studying many biological problems. A core step of analyzing single-cell data is the assignment of cell clusters to specific cell types. Reference-based methods are proposed for predicting cell types for single-cell clusters. However, the scalability and lack of preprocessed reference datasets prevent them from being practical and easy to use. RESULTS Here, we introduce a reference-based cell annotation web server, CellAnn, which is super-fast and easy to use. CellAnn contains a comprehensive reference database with 204 human and 191 mouse single-cell datasets. These reference datasets cover 32 organs. Furthermore, we developed a cluster-to-cluster alignment method to transfer cell labels from the reference to the query datasets, which is superior to the existing methods with higher accuracy and higher scalability. Finally, CellAnn is an online tool that integrates all the procedures in cell annotation, including reference searching, transferring cell labels, visualizing results, and harmonizing cell annotation labels. Through the user-friendly interface, users can identify the best annotation by cross-validating with multiple reference datasets. We believe that CellAnn can greatly facilitate single-cell sequencing data analysis. AVAILABILITY AND IMPLEMENTATION The web server is available at www.cellann.io, and the source code is available at https://github.com/Pinlyu3/CellAnn_shinyapp.
Collapse
Affiliation(s)
- Pin Lyu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| | - Yijie Zhai
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| | - Taibo Li
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21218, United States
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| |
Collapse
|
42
|
Cheng C, Chen W, Jin H, Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication. Cells 2023; 12:1970. [PMID: 37566049 PMCID: PMC10417635 DOI: 10.3390/cells12151970] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular biology at an unprecedented resolution, enabling the characterization of cellular heterogeneity, identification of rare but significant cell types, and exploration of cell-cell communications and interactions. Its broad applications span both basic and clinical research domains. In this comprehensive review, we survey the current landscape of scRNA-seq analysis methods and tools, focusing on count modeling, cell-type annotation, data integration, including spatial transcriptomics, and the inference of cell-cell communication. We review the challenges encountered in scRNA-seq analysis, including issues of sparsity or low expression, reliability of cell annotation, and assumptions in data integration, and discuss the potential impact of suboptimal clustering and differential expression analysis tools on downstream analyses, particularly in identifying cell subpopulations. Finally, we discuss recent advancements and future directions for enhancing scRNA-seq analysis. Specifically, we highlight the development of novel tools for annotating single-cell data, integrating and interpreting multimodal datasets covering transcriptomics, epigenomics, and proteomics, and inferring cellular communication networks. By elucidating the latest progress and innovation, we provide a comprehensive overview of the rapidly advancing field of scRNA-seq analysis.
Collapse
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| | - Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Hongjian Jin
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Xiang Chen
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| |
Collapse
|
43
|
Lin Y, Cao Y, Willie E, Patrick E, Yang JYH. Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2. Nat Commun 2023; 14:4272. [PMID: 37460600 DOI: 10.1038/s41467-023-39923-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 07/04/2023] [Indexed: 07/20/2023] Open
Abstract
The recent emergence of multi-sample multi-condition single-cell multi-cohort studies allows researchers to investigate different cell states. The effective integration of multiple large-cohort studies promises biological insights into cells under different conditions that individual studies cannot provide. Here, we present scMerge2, a scalable algorithm that allows data integration of atlas-scale multi-sample multi-condition single-cell studies. We have generalized scMerge2 to enable the merging of millions of cells from single-cell studies generated by various single-cell technologies. Using a large COVID-19 data collection with over five million cells from 1000+ individuals, we demonstrate that scMerge2 enables multi-sample multi-condition scRNA-seq data integration from multiple cohorts and reveals signatures derived from cell-type expression that are more accurate in discriminating disease progression. Further, we demonstrate that scMerge2 can remove dataset variability in CyTOF, imaging mass cytometry and CITE-seq experiments, demonstrating its applicability to a broad spectrum of single-cell profiling technologies.
Collapse
Affiliation(s)
- Yingxin Lin
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Yue Cao
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Elijah Willie
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
| | - Ellis Patrick
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- The Westmead Institute for Medical Research, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Jean Y H Yang
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia.
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia.
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia.
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China.
| |
Collapse
|
44
|
Wang L, Dou X, Chen S, Yu X, Huang X, Zhang L, Chen Y, Wang J, Yang K, Bugno J, Pitroda S, Ding X, Piffko A, Si W, Chen C, Jiang H, Zhou B, Chmura SJ, Luo C, Liang HL, He C, Weichselbaum RR. YTHDF2 inhibition potentiates radiotherapy antitumor efficacy. Cancer Cell 2023; 41:1294-1308.e8. [PMID: 37236197 PMCID: PMC10524856 DOI: 10.1016/j.ccell.2023.04.019] [Citation(s) in RCA: 98] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/23/2022] [Accepted: 04/28/2023] [Indexed: 05/28/2023]
Abstract
RNA N6-methyladenosine (m6A) modification is implicated in cancer progression. However, the impact of m6A on the antitumor effects of radiotherapy and the related mechanisms are unknown. Here we show that ionizing radiation (IR) induces immunosuppressive myeloid-derived suppressor cell (MDSC) expansion and YTHDF2 expression in both murine models and humans. Following IR, loss of Ythdf2 in myeloid cells augments antitumor immunity and overcomes tumor radioresistance by altering MDSC differentiation and inhibiting MDSC infiltration and suppressive function. The remodeling of the landscape of MDSC populations by local IR is reversed by Ythdf2 deficiency. IR-induced YTHDF2 expression relies on NF-κB signaling; YTHDF2 in turn leads to NF-κB activation by directly binding and degrading transcripts encoding negative regulators of NF-κB signaling, resulting in an IR-YTHDF2-NF-κB circuit. Pharmacological inhibition of YTHDF2 overcomes MDSC-induced immunosuppression and improves combined IR and/or anti-PD-L1 treatment. Thus, YTHDF2 is a promising target to improve radiotherapy (RT) and RT/immunotherapy combinations.
Collapse
Affiliation(s)
- Liangliang Wang
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA
| | - Xiaoyang Dou
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA; Howard Hughes Medical Institute, University of Chicago, Chicago, IL 60637, USA
| | - Shijie Chen
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Xianbin Yu
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA; Howard Hughes Medical Institute, University of Chicago, Chicago, IL 60637, USA
| | - Xiaona Huang
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA
| | - Linda Zhang
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA; Howard Hughes Medical Institute, University of Chicago, Chicago, IL 60637, USA
| | - Yantao Chen
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Jiaai Wang
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA
| | - Kaiting Yang
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA
| | - Jason Bugno
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA; The Committee on Clinical Pharmacology and Pharmacogenomics, University of Chicago, Chicago, IL 600637, USA
| | - Sean Pitroda
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA
| | - Xingchen Ding
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan 250117, China
| | - Andras Piffko
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA; Department of Neurosurgery, University Medical Center Hamburg-Eppendorf, Hamburg 20246, Germany
| | - Wei Si
- State Key Laboratory of Animal Nutrition, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Chao Chen
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Hualiang Jiang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Bing Zhou
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Steven J Chmura
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA
| | - Cheng Luo
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528437, China.
| | - Hua Laura Liang
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA.
| | - Chuan He
- Department of Chemistry, Department of Biochemistry and Molecular Biology, and Institute for Biophysical Dynamics, The University of Chicago, Chicago, IL 60637, USA; Howard Hughes Medical Institute, University of Chicago, Chicago, IL 60637, USA; Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA.
| | - Ralph R Weichselbaum
- Department of Radiation and Cellular Oncology, University of Chicago, Chicago, IL 60637, USA; Ludwig Center for Metastasis Research, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
45
|
Xiong YX, Wang MG, Chen L, Zhang XF. Cell-type annotation with accurate unseen cell-type identification using multiple references. PLoS Comput Biol 2023; 19:e1011261. [PMID: 37379341 PMCID: PMC10335708 DOI: 10.1371/journal.pcbi.1011261] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/11/2023] [Accepted: 06/11/2023] [Indexed: 06/30/2023] Open
Abstract
The recent advances in single-cell RNA sequencing (scRNA-seq) techniques have stimulated efforts to identify and characterize the cellular composition of complex tissues. With the advent of various sequencing techniques, automated cell-type annotation using a well-annotated scRNA-seq reference becomes popular. But it relies on the diversity of cell types in the reference, which may not capture all the cell types present in the query data of interest. There are generally unseen cell types in the query data of interest because most data atlases are obtained for different purposes and techniques. Identifying previously unseen cell types is essential for improving annotation accuracy and uncovering novel biological discoveries. To address this challenge, we propose mtANN (multiple-reference-based scRNA-seq data annotation), a new method to automatically annotate query data while accurately identifying unseen cell types with the aid of multiple references. Key innovations of mtANN include the integration of deep learning and ensemble learning to improve prediction accuracy, and the introduction of a new metric that considers three complementary aspects to distinguish between unseen cell types and shared cell types. Additionally, we provide a data-driven method to adaptively select a threshold for identifying previously unseen cell types. We demonstrate the advantages of mtANN over state-of-the-art methods for unseen cell-type identification and cell-type annotation on two benchmark dataset collections, as well as its predictive power on a collection of COVID-19 datasets. The source code and tutorial are available at https://github.com/Zhangxf-ccnu/mtANN.
Collapse
Affiliation(s)
- Yi-Xuan Xiong
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| | - Meng-Guo Wang
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| | - Luonan Chen
- State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
- Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, Guangdong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, China
| |
Collapse
|
46
|
Yu L, Liu C, Yang JYH, Yang P. Ensemble deep learning of embeddings for clustering multimodal single-cell omics data. Bioinformatics 2023; 39:btad382. [PMID: 37314966 PMCID: PMC10287920 DOI: 10.1093/bioinformatics/btad382] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/16/2023] [Accepted: 06/12/2023] [Indexed: 06/16/2023] Open
Abstract
MOTIVATION Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy. RESULTS We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data. AVAILABILITY AND IMPLEMENTATION SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').
Collapse
Affiliation(s)
- Lijia Yu
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| | - Pengyi Yang
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| |
Collapse
|
47
|
Davalos OA, Heydari AA, Fertig EJ, Sindi SS, Hoyer KK. Boosting Single-Cell RNA Sequencing Analysis with Simple Neural Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.29.542760. [PMID: 37398136 PMCID: PMC10312486 DOI: 10.1101/2023.05.29.542760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
A limitation of current deep learning (DL) approaches for single-cell RNA sequencing (scRNAseq) analysis is the lack of interpretability. Moreover, existing pipelines are designed and trained for specific tasks used disjointly for different stages of analysis. We present scANNA, a novel interpretable DL model for scRNAseq studies that leverages neural attention to learn gene associations. After training, the learned gene importance (interpretability) is used to perform downstream analyses (e.g., global marker selection and cell-type classification) without retraining. ScANNA's performance is comparable to or better than state-of-the-art methods designed and trained for specific standard scRNAseq analyses even though scANNA was not trained for these tasks explicitly. ScANNA enables researchers to discover meaningful results without extensive prior knowledge or training separate task-specific models, saving time and enhancing scRNAseq analyses.
Collapse
Affiliation(s)
- Oscar A. Davalos
- Quantitative and Systems Biology Graduate Program, University of California, Merced, CA, USA
| | - A. Ali Heydari
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Elana J. Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Katrina K. Hoyer
- Health Sciences Research Institute, University of California, Merced, CA, USA
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| |
Collapse
|
48
|
Lin Y, Wu TY, Chen X, Wan S, Chao B, Xin J, Yang JY, Wong WH, Wang YXR. scTIE: data integration and inference of gene regulation using single-cell temporal multimodal data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541381. [PMID: 37292801 PMCID: PMC10245711 DOI: 10.1101/2023.05.18.541381] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal datasets, we demonstrate scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome dataset we generated from differentiating mouse embryonic stem cells over time, we demonstrate scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.
Collapse
Affiliation(s)
- Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Tung-Yu Wu
- Department of Statistics, Stanford University, CA, USA
| | - Xi Chen
- Department of Statistics, Stanford University, CA, USA
| | - Sheng Wan
- Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Brian Chao
- Department of Electrical Engineering, Stanford University, CA, USA
| | - Jingxue Xin
- Department of Statistics, Stanford University, CA, USA
| | - Jean Y.H. Yang
- School of Mathematics and Statistics, The University of Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Wing H. Wong
- Department of Statistics, Stanford University, CA, USA
- Department of Biomedical Data Science, Stanford University, CA, USA
- Bio-X Program, Stanford University, CA, USA
| | - Y. X. Rachel Wang
- School of Mathematics and Statistics, The University of Sydney, NSW, Australia
| |
Collapse
|
49
|
Cheng Y, Fan X, Zhang J, Li Y. A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data. Commun Biol 2023; 6:545. [PMID: 37210444 DOI: 10.1038/s42003-023-04928-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/11/2023] [Indexed: 05/22/2023] Open
Abstract
Automatic cell type annotation methods are increasingly used in single-cell RNA sequencing (scRNA-seq) analysis due to their fast and precise advantages. However, current methods often fail to account for the imbalance of scRNA-seq datasets and ignore information from smaller populations, leading to significant biological analysis errors. Here, we introduce scBalance, an integrated sparse neural network framework that incorporates adaptive weight sampling and dropout techniques for auto-annotation tasks. Using 20 scRNA-seq datasets with varying scales and degrees of imbalance, we demonstrate that scBalance outperforms current methods in both intra- and inter-dataset annotation tasks. Additionally, scBalance displays impressive scalability in identifying rare cell types in million-level datasets, as shown in the bronchoalveolar cell landscape. scBalance is also significantly faster than commonly used tools and comes in a user-friendly format, making it a superior tool for scRNA-seq analysis on the Python-based platform.
Collapse
Affiliation(s)
- Yuqi Cheng
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Xingyu Fan
- School of Information and Software Engineering, University of Electronic Science and Technology of China, 610054, Chengdu, China
| | - Jianing Zhang
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China.
- The CUHK Shenzhen Research Institute, Hi-Tech Park, Nanshan, 518057, Shenzhen, China.
| |
Collapse
|
50
|
Liu H, Li H, Sharma A, Huang W, Pan D, Gu Y, Lin L, Sun X, Liu H. scAnno: a deconvolution strategy-based automatic cell type annotation tool for single-cell RNA-sequencing data sets. Brief Bioinform 2023; 24:bbad179. [PMID: 37183449 DOI: 10.1093/bib/bbad179] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/29/2023] [Accepted: 04/19/2023] [Indexed: 05/16/2023] Open
Abstract
Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at https://github.com/liuhong-jia/scAnno.
Collapse
Affiliation(s)
- Hongjia Liu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Huamei Li
- Department of General Surgery, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, 210008, PR China
| | - Amit Sharma
- Department of Neurosurgery, University Hospital Bonn, Bonn, Germany
| | | | - Duo Pan
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Yu Gu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Lu Lin
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Xiao Sun
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| | - Hongde Liu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science & Medical Engineering, Southeast University, Nanjing, 210096, China
| |
Collapse
|