1
|
Zeng B, Zhang C, Liang Y, Huang J, Li D, Liu Z, Liao H, Yang T, Liu M, Zou C, Liu D, Qin B. Single-cell RNA sequencing highlights a significant retinal Müller glial population in dry age-related macular degeneration. iScience 2025; 28:112464. [PMID: 40343286 PMCID: PMC12059717 DOI: 10.1016/j.isci.2025.112464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 12/21/2024] [Accepted: 04/14/2025] [Indexed: 05/11/2025] Open
Abstract
The main challenge in dissecting the cells and pathways involved in the pathogenesis of age-related macular degeneration (AMD) is the highly heterogeneous and dynamic nature of the retinal microenvironment. This study aimed to describe the comprehensive landscape of the dry AMD (dAMD) model and identify the key cell cluster contributing to dAMD. We identified a subset of Müller cells that express high levels of Sox2, which play crucial roles in homeostasis and neuroprotection in both mouse models of AMD and patients with dAMD. Additionally, the number of Sox2+ Müller cells decreased significantly during the progression of AMD, indicating these cells were damaged and underwent cell death. Interestingly, ferroptosis and apoptosis were identified as contributors to the damage of Sox2+ Müller cells. Our findings are potentially valuable not only for advancing the current understanding of dAMD progression but also for the development of treatment strategies through the protection of Müller cells.
Collapse
Affiliation(s)
- Bing Zeng
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
- Department of Ophthalmology, Xiangya Hospital, Central South University, Changsha 410008, Hunan, China
| | - Chuanhe Zhang
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Yifan Liang
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Jianguo Huang
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Deshuang Li
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Ziling Liu
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Hongxia Liao
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
| | - Tedu Yang
- Shenzhen Shendong Aier Eye Hospital, Shenzhen, China
| | - Muyun Liu
- National Engineering Research Center of Foundational Technologies for CGT Industry, Shenzhen Kenuo Medical Laboratory, Shenzhen, Guangdong, China
| | - Chang Zou
- Department of Clinical Medical Research Center, The Second Clinical Medical College of Jinan University, the First Affiliated Hospital of Southern University of Science and Technology, Shenzhen People’s Hospital, Shenzhen, Guangdong, P.R. China
| | - Dongcheng Liu
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
- Aier School of Ophthalmology, Central South University, Changsha, China
| | - Bo Qin
- Shenzhen Aier Eye Hospital, Aier Eye Hospital, Jinan University, Shenzhen, China
- Shenzhen Aier Ophthalmic Technology Institute, Shenzhen, China
- Aier School of Ophthalmology, Central South University, Changsha, China
| |
Collapse
|
2
|
Ramirez A, Orcutt-Jahns BT, Pascoe S, Abraham A, Remigio B, Thomas N, Meyer AS. Integrative, high-resolution analysis of single-cell gene expression across experimental conditions with PARAFAC2-RISE. Cell Syst 2025:101294. [PMID: 40378843 DOI: 10.1016/j.cels.2025.101294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 02/20/2025] [Accepted: 04/22/2025] [Indexed: 05/19/2025]
Abstract
Effective exploration and analysis tools are vital for the extraction of insights from single-cell data. However, current techniques for modeling single-cell studies performed across experimental conditions (e.g., samples) require restrictive assumptions or do not adequately deconvolute condition-to-condition variation from cell-to-cell variation. Here, we report that reduction and insight in single-cell exploration (RISE), an adaptation of the tensor decomposition method PARAFAC2, enables the dimensionality reduction and analysis of single-cell data across conditions. We demonstrate the benefits of RISE across distinct examples of single-cell RNA-sequencing experiments of peripheral immune cells: pharmacologic drug perturbations and systemic lupus erythematosus patient samples. RISE enables associations of gene variation patterns with patients or perturbations while connecting each coordinated change to single cells without requiring cell-type annotations. The theoretical grounding of RISE suggests a unified framework for many single-cell data modeling tasks while providing an intuitive dimensionality reduction approach for multi-sample single-cell studies across biological contexts. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Andrew Ramirez
- Department of Bioengineering, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Brian T Orcutt-Jahns
- Department of Bioengineering, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Sean Pascoe
- Department of Bioengineering, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; Department of Molecular Biosciences, Northwestern University, Evanston, IL 60208, USA
| | - Armaan Abraham
- Department of Bioengineering, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA
| | - Breanna Remigio
- Computational and Systems Biology, UCLA, Los Angeles, CA 90095, USA
| | - Nathaniel Thomas
- Department of Computer Science, UCLA, Los Angeles, CA 90095, USA
| | - Aaron S Meyer
- Department of Bioengineering, University of California, Los Angeles (UCLA), Los Angeles, CA 90095, USA; Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, CA 90095, USA; Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, UCLA, Los Angeles, CA 90095, USA.
| |
Collapse
|
3
|
Pan Q, Ding L, Hladyshau S, Yao X, Zhou J, Yan L, Dhungana Y, Shi H, Qian C, Dong X, Burdyshaw C, Veloso JP, Khatamian A, Xie Z, Risch I, Yang X, Yang J, Huang X, Fang J, Jain A, Jain A, Rusch M, Brewer M, Peng J, Yan KK, Chi H, Yu J. scMINER: a mutual information-based framework for clustering and hidden driver inference from single-cell transcriptomics data. Nat Commun 2025; 16:4305. [PMID: 40341143 PMCID: PMC12062461 DOI: 10.1038/s41467-025-59620-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Accepted: 04/28/2025] [Indexed: 05/10/2025] Open
Abstract
Single-cell transcriptomics data present challenges due to their inherent stochasticity and sparsity, complicating both cell clustering and cell type-specific network inference. To address these challenges, we introduce scMINER (single-cell Mutual Information-based Network Engineering Ranger), an integrative framework for unsupervised cell clustering, transcription factor and signaling protein network inference, and identification of hidden drivers from single-cell transcriptomic data. scMINER demonstrates superior accuracy in cell clustering, outperforming five state-of-the-art algorithms and excelling in distinguishing closely related cell populations. For network inference, scMINER outperforms three established methods, as validated by ATAC-seq and CROP-seq. In particular, it surpasses SCENIC in revealing key transcription factor drivers involved in T cell exhaustion and Treg tissue specification. Moreover, scMINER enables the inference of signaling protein networks and drivers with high accuracy, which presents an advantage in multimodal single cell data analysis. In addition, we establish scMINER Portal, an interactive visualization tool to facilitate exploration of scMINER results.
Collapse
Affiliation(s)
- Qingfei Pan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Liang Ding
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Siarhei Hladyshau
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xiangyu Yao
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiayu Zhou
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Lei Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Yogesh Dhungana
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Hao Shi
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Chenxi Qian
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xinran Dong
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai, 201102, P.R. China
| | - Chad Burdyshaw
- Department of Information Services, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Joao Pedro Veloso
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Alireza Khatamian
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Zhen Xie
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Physiology, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
| | - Isabel Risch
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xu Yang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiyuan Yang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Xin Huang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Precision Research Center for Refractory Diseases, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201620, China
| | - Jason Fang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Anuj Jain
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Arihant Jain
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Michael Rusch
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Michael Brewer
- Department of Information Services, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Junmin Peng
- Department of Structural Biology and Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Koon-Kiu Yan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Hongbo Chi
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Jiyang Yu
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
- Graduate School of Biomedical Sciences, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
4
|
Huynh KLA, Tyc KM, Matuck BF, Easter QT, Pratapa A, Kumar NV, Pérez P, Kulchar RJ, Pranzatelli TJF, de Souza D, Weaver TM, Qu X, Soares Junior LAV, Dolhnokoff M, Kleiner DE, Hewitt SM, da Silva LFF, Rocha VG, Warner BM, Byrd KM, Liu J. Deconvolution of cell types and states in spatial multiomics utilizing TACIT. Nat Commun 2025; 16:3747. [PMID: 40258827 PMCID: PMC12012066 DOI: 10.1038/s41467-025-58874-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 04/02/2025] [Indexed: 04/23/2025] Open
Abstract
Identifying cell types and states remains a time-consuming, error-prone challenge for spatial biology. While deep learning increasingly plays a role, it is difficult to generalize due to variability at the level of cells, neighborhoods, and niches in health and disease. To address this, we develop TACIT, an unsupervised algorithm for cell annotation using predefined signatures that operates without training data. TACIT uses unbiased thresholding to distinguish positive cells from background, focusing on relevant markers to identify ambiguous cells in multiomic assays. Using five datasets (5,000,000 cells; 51 cell types) from three niches (brain, intestine, gland), TACIT outperforms existing unsupervised methods in accuracy and scalability. Integrating TACIT-identified cell types reveals new phenotypes in two inflammatory gland diseases. Finally, using combined spatial transcriptomics and proteomics, we discover under- and overrepresented immune cell types and states in regions of interest, suggesting multimodality is essential for translating spatial biology to clinical applications.
Collapse
Affiliation(s)
- Khoa L A Huynh
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Katarzyna M Tyc
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond, VA, USA
| | - Bruno F Matuck
- Department of Oral and Craniofacial Molecular Biology, Philips Institute for Oral Health Research, Virginia Commonwealth University, Richmond, VA, USA
| | - Quinn T Easter
- Department of Oral and Craniofacial Molecular Biology, Philips Institute for Oral Health Research, Virginia Commonwealth University, Richmond, VA, USA
| | - Aditya Pratapa
- Department of Cell Biology, Duke University, Durham, NC, USA
| | - Nikhil V Kumar
- Adams School of Dentistry, University of North Carolina, Chapel Hill, USA
| | - Paola Pérez
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Rachel J Kulchar
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Thomas J F Pranzatelli
- Adeno-Associated Virus Biology Section, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Deiziane de Souza
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR, Sao Paulo, Brazil
| | - Theresa M Weaver
- Department of Oral and Craniofacial Molecular Biology, Philips Institute for Oral Health Research, Virginia Commonwealth University, Richmond, VA, USA
| | - Xufeng Qu
- Massey Cancer Center, Richmond, VA, USA
| | | | - Marisa Dolhnokoff
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR, Sao Paulo, Brazil
| | - David E Kleiner
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen M Hewitt
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Vanderson Geraldo Rocha
- Department of Hematology, Transfusion and Cell Therapy Service, University of Sao Paulo, Sao Paulo, Brazil
| | - Blake M Warner
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Kevin M Byrd
- Department of Oral and Craniofacial Molecular Biology, Philips Institute for Oral Health Research, Virginia Commonwealth University, Richmond, VA, USA.
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA.
| | - Jinze Liu
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Massey Cancer Center, Richmond, VA, USA.
| |
Collapse
|
5
|
Yuan L, Xu Z, Meng B, Ye L. scAMZI: attention-based deep autoencoder with zero-inflated layer for clustering scRNA-seq data. BMC Genomics 2025; 26:350. [PMID: 40197174 PMCID: PMC11974017 DOI: 10.1186/s12864-025-11511-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 03/20/2025] [Indexed: 04/10/2025] Open
Abstract
BACKGROUND Clustering scRNA-seq data plays a vital role in scRNA-seq data analysis and downstream analyses. Many computational methods have been proposed and achieved remarkable results. However, there are several limitations of these methods. First, they do not fully exploit cellular features. Second, they are developed based on gene expression information and lack of flexibility in integrating intercellular relationships. Finally, the performance of these methods is affected by dropout event. RESULTS We propose a novel deep learning (DL) model based on attention autoencoder and zero-inflated (ZI) layer, namely scAMZI, to cluster scRNA-seq data. scAMZI is mainly composed of SimAM (a Simple, parameter-free Attention Module), autoencoder, ZINB (Zero-Inflated Negative Binomial) model and ZI layer. Based on ZINB model, we introduce autoencoder and SimAM to reduce dimensionality of data and learn feature representations of cells and relationships between cells. Meanwhile, ZI layer is used to handle zero values in the data. We compare the performance of scAMZI with nine methods (three shallow learning algorithms and six state-of-the-art DL-based methods) on fourteen benchmark scRNA-seq datasets of various sizes (from hundreds to tens of thousands of cells) with known cell types. Experimental results demonstrate that scAMZI outperforms competing methods. CONCLUSIONS scAMZI outperforms competing methods and can facilitate downstream analyses such as cell annotation, marker gene discovery, and cell trajectory inference. The package of scAMZI is made freely available at https://doi.org/10.5281/zenodo.13131559 .
Collapse
Affiliation(s)
- Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, Jinan, 250353, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, Jinan, 250353, China
- Shandong Provincial Key Laboratory of Industrial Network and Information System Security, Shandong Fundamental Research Center for Computer Science, 3501 Daxue Road, Jinan, 250353, China
| | - Zhijie Xu
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, Jinan, 250353, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, Jinan, 250353, China
- Shandong Provincial Key Laboratory of Industrial Network and Information System Security, Shandong Fundamental Research Center for Computer Science, 3501 Daxue Road, Jinan, 250353, China
| | - Boyuan Meng
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, Jinan, 250353, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), 3501 Daxue Road, Jinan, 250353, China
- Shandong Provincial Key Laboratory of Industrial Network and Information System Security, Shandong Fundamental Research Center for Computer Science, 3501 Daxue Road, Jinan, 250353, China
| | - Lan Ye
- Cancer Center, The Second Hospital of Shandong University, 247 Beiyuan Street, Jinan, 250033, China.
| |
Collapse
|
6
|
DenAdel A, Ramseier ML, Navia AW, Shalek AK, Raghavan S, Winter PS, Amini AP, Crawford L. Artificial variables help to avoid over-clustering in single-cell RNA sequencing. Am J Hum Genet 2025; 112:940-951. [PMID: 40081375 DOI: 10.1016/j.ajhg.2025.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Revised: 02/11/2025] [Accepted: 02/12/2025] [Indexed: 03/16/2025] Open
Abstract
Standard single-cell RNA sequencing (scRNA-seq) pipelines nearly always include unsupervised clustering as a key step in identifying biologically distinct cell types. A follow-up step in these pipelines is to test for differential expression between the identified clusters. When algorithms over-cluster, downstream analyses can produce misleading results. In this work, we present "recall" (calibrated clustering with artificial variables), a method for protecting against over-clustering by controlling for the impact of reusing the same data twice when performing differential expression analysis, commonly known as "double dipping." Importantly, our approach can be applied to a wide range of clustering algorithms. Using real and simulated data, we show that recall provides state-of-the-art clustering performance and can rapidly analyze large-scale scRNA-seq studies, even on a personal laptop.
Collapse
Affiliation(s)
- Alan DenAdel
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Michelle L Ramseier
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA 02139, USA
| | - Andrew W Navia
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alex K Shalek
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA 02139, USA
| | - Srivatsan Raghavan
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02115, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Peter S Winter
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ava P Amini
- Microsoft Research, Cambridge, MA 02142, USA
| | | |
Collapse
|
7
|
Traversa D, Chiara M. Mapping Cell Identity from scRNA-seq: A primer on computational methods. Comput Struct Biotechnol J 2025; 27:1559-1569. [PMID: 40270709 PMCID: PMC12017876 DOI: 10.1016/j.csbj.2025.03.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 03/29/2025] [Accepted: 03/31/2025] [Indexed: 04/25/2025] Open
Abstract
Single cell (sc) technologies mark a conceptual and methodological breakthrough in our way to study cells, the base units of life. Thanks to these technological developments, large-scale initiatives are currently ongoing aimed at mapping of all the cell types in the human body, with the ambitious aim to gain a cell-level resolution of physiological development and disease. Since its broad applicability and ease of interpretation scRNA-seq is probably the most common sc-based application. This assay uses high throughput RNA sequencing to capture gene expression profiles at the sc-level. Subsequently, under the assumption that differences in transcriptional programs correspond to distinct cellular identities, ad-hoc computational methods are used to infer cell types from gene expression patterns. A wide array of computational methods were developed for this task. However, depending on the underlying algorithmic approach and associated computational requirements, each method might have a specific range of application, with implications that are not always clear to the end user. Here we will provide a concise overview on state-of-the-art computational methods for cell identity annotation in scRNA-seq, tailored for new users and non-computational scientists. To this end, we classify existing tools in five main categories, and discuss their key strengths, limitations and range of application.
Collapse
Affiliation(s)
- Daniele Traversa
- Department of Biosciences, Università degli Studi di Milano, via Celoria 26, Milan 20133, Italy
| | - Matteo Chiara
- Department of Biosciences, Università degli Studi di Milano, via Celoria 26, Milan 20133, Italy
| |
Collapse
|
8
|
Rafi FR, Heya NR, Hafiz MS, Jim JR, Kabir MM, Mridha MF. A systematic review of single-cell RNA sequencing applications and innovations. Comput Biol Chem 2025; 115:108362. [PMID: 39919386 DOI: 10.1016/j.compbiolchem.2025.108362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 12/26/2024] [Accepted: 01/21/2025] [Indexed: 02/09/2025]
Abstract
Bulk RNA sequencing is one type of RNA sequencing technique, as well as targeted RNA sequencing and whole transcriptome sequencing. It provides valuable insights into gene expression in specific cell populations or regions. However, these methods often miss the diversity of cells within complex tissues. This restriction is overcome by single-cell RNA sequencing, which records gene expression at the single-cell level. It offers a detailed picture of the diversity of cells. It is essential to study glucose homeostasis. It offers thorough explanations of cellular variation. Networks and Governance Dynamics The use of scRNA-seq in islet cells is reviewed in this study, along with sample preparation, sequencing, and computational analysis. It highlights advances in understanding cell types. Gene activity and cell interactions. Along with the challenges and limitations of scRNA-seq, this review highlights the importance of scRNA-seq in understanding complex biological processes and diseases. It is an essential resource for future research and method development in this field, which will help to build personalized treatment.
Collapse
Affiliation(s)
- Fahamidur Rahaman Rafi
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1340, Bangladesh.
| | - Nafeya Rahman Heya
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1340, Bangladesh.
| | - Md Sadman Hafiz
- Institute of Information and Communication Technology, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh.
| | - Jamin Rahman Jim
- Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh.
| | - Md Mohsin Kabir
- Department of Computer Science & Engineering, Bangladesh University of Business & Technology, Dhaka 1216, Bangladesh.
| | - M F Mridha
- Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh.
| |
Collapse
|
9
|
Llera-Oyola J, Pérez-Moraga R, Parras M, Rosón B. How to view the female reproductive tract through single-cell looking glasses. Am J Obstet Gynecol 2025; 232:S21-S43. [PMID: 40253081 DOI: 10.1016/j.ajog.2024.08.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 07/04/2024] [Accepted: 08/24/2024] [Indexed: 04/21/2025]
Abstract
Single-cell technologies have emerged as an unprecedented tool for biologists and clinicians, allowing them to assess organs and tissues at the level of individual cells. In the field of women's reproductive biology, single-cell studies have provided insights into the cellular and molecular processes that regulate reproductive and obstetrical functions in health and disease. The knowledge that these studies generate is helping clinicians to improve the understanding and diagnosis of infertility related issues or pregnancy complications and to find new avenues for their treatment. However, navigating the expansive landscape of this type of transcriptomic data analysis represents a pivotal challenge in current research. Single cell RNA sequencing involves isolating cells into droplets, reverse transcribing RNA to generate complementary DNA, with each droplet content uniquely labeled by a barcode. Upon sequencing the complementary DNAs, the barcodes enable the reassignment of sequencing reads to individual droplets, facilitating the reconstruction of the cellular landscape of the sample obtained from a tissue or organ and beyond. Researchers, equipped with the metaphorical 'single-cell glasses,' must adequately choose from a plethora of strategies to dissect and interpret cellular information. Sophisticated algorithms and the decision-making process are often underestimated, resulting in artefactual or cumbersome interpreted results. Computational biologists apply and innovate computational tools designed to process, model, and interpret expansive datasets. The ramifications of their work extend far beyond the realm of data processing; they give shape to the outcome of analyses, playing a pivotal role in drawing meaningful conclusions from the wealth of information garnered. In this review, we describe the wide variety of approaches and analytical steps available with enough detail to gain a concise picture of what a complete examination of a single-cell dataset would be. We commence with a discussion on key points in experimental design, highlighting crucial questions one should consider. Following this, we delve into the various preprocessing and quality control steps essential for any single-cell dataset. The subsequent section offers a detailed guide on constructing a single-cell atlas, exploring nuances such as differential characteristics in visualization and clustering techniques, as well as strategies for assigning identity to cell populations through gene marker annotations. Moving beyond the creation of an atlas, we explore methods for investigating pathological conditions. This involves conducting cell population comparison tests between conditions and analyzing specific cell-to-cell communications and cellular differentiation trajectories in both health and disease scenarios. This work aims to furnish a newcomer researcher and/or clinician with essential guidelines to embark on a single-cell adventure without succumbing to common pitfalls. By bridging the gap between theory and practice, it facilitates the translation of single-cell technologies into clinically relevant applications. Throughout the manuscript, practical examples of its usage in women's reproductive health studies are provided. Various sections delve into specific clinical scenarios, demonstrating how these guidelines can be instrumental in unraveling the molecular landscapes of diseases and physiological processes related to women's reproduction.
Collapse
Affiliation(s)
- Jaime Llera-Oyola
- Carlos Simon Foundation, INCLIVA Health Research Institute, Valencia, Spain
| | - Raúl Pérez-Moraga
- Carlos Simon Foundation, INCLIVA Health Research Institute, Valencia, Spain; R&D Department, Igenomix, Valencia, Spain
| | - Marcos Parras
- Carlos Simon Foundation, INCLIVA Health Research Institute, Valencia, Spain
| | - Beatriz Rosón
- Carlos Simon Foundation, INCLIVA Health Research Institute, Valencia, Spain.
| |
Collapse
|
10
|
Pavel A, Grønberg MG, Clemmensen LH. The impact of dropouts in scRNAseq dense neighborhood analysis. Comput Struct Biotechnol J 2025; 27:1278-1285. [PMID: 40225837 PMCID: PMC11992407 DOI: 10.1016/j.csbj.2025.03.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 03/19/2025] [Accepted: 03/20/2025] [Indexed: 04/15/2025] Open
Abstract
Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and preprocessing pipelines are based on the assumption that poor data can be counteracted by quantity and that similar cells (samples) are close to each other in space. Clustering is commonly used to detect clusters (dense local cell neighborhoods) under the assumption that similar cells are close to each other in space (where close is dependent on the (distance) metric used). The most commonly used clustering methodologies to detect dense local neighborhoods are based on graph clustering on a nearest neighbor graph. However, high dropout rates may break this assumption and make it difficult to reliably detect such dense local neighborhoods. We assess the cluster homogeneity and stability under increasing degrees of dropouts in one of the most popular clustering pipelines (dimensionality reduction + graph based clustering), as provided by scRNAseq analyses packages Seurat and Scanpy. Our study showcases that while the default pipeline performs well in terms of cluster homogeneity (i.e., cells in a cluster are of the same type), also with increasing dropout rates, the stability of clusters (i.e., cell pairs consistently being in the same cluster) decreases. This implies that sub-populations within cell types are increasingly difficult to identify under increasing dropout rates because observations are not consistently close. Our results challenge the current practice of using default clustering pipelines and the general assumption of identifiable local neighborhoods on high dropout data. Hence, these results suggest that careful consideration in interpretation and downstream analysis need to be made when relying on local neighborhoods and clusters on scRNAseq data. In addition, these results call for extensive benchmarking, to identify and provide methods robust in their local neighborhood relationships on data containing low to high dropout rates.
Collapse
Affiliation(s)
- Alisa Pavel
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - Manja Gersholm Grønberg
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - Line H. Clemmensen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
- Department of Mathematical Sciences, University of Copenhagen, 2100, Copenhagen, Denmark
| |
Collapse
|
11
|
Ramirez A, Orcutt-Jahns BT, Pascoe S, Abraham A, Remigio B, Thomas N, Meyer AS. Integrative, high-resolution analysis of single cell gene expression across experimental conditions with PARAFAC2-RISE. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.07.29.605698. [PMID: 39131377 PMCID: PMC11312543 DOI: 10.1101/2024.07.29.605698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Effective and scalable exploration and analysis tools are vital for the extraction of insights from large-scale single-cell data. However, current techniques for modeling single-cell studies performed across experimental conditions (e.g., samples, perturbations, or patients) require restrictive assumptions, lack flexibility, or do not adequately deconvolute condition-to-condition variation from cell-to-cell variation. Here, we report that Reduction and Insight in Single-cell Exploration (RISE), an adaptation of the tensor decomposition method PARAFAC2, enables the dimensionality reduction and analysis of single-cell data across conditions. We demonstrate the benefits of RISE across two distinct examples of single-cell RNA-sequencing experiments of peripheral immune cells: pharmacologic drug perturbations and systemic lupus erythematosus (SLE) patient samples. RISE enables straightforward associations of gene variation patterns with specific patients or perturbations, while connecting each coordinated change to single cells without requiring cell type annotations. The theoretical grounding of RISE suggests a unified framework for many single-cell data modeling tasks. Thus, RISE provides an intuitive universal dimensionality reduction approach for multi-sample single-cell studies across diverse biological contexts.
Collapse
Affiliation(s)
- Andrew Ramirez
- Department of Bioengineering, University of California, Los Angeles (UCLA), CA, USA
| | | | - Sean Pascoe
- Department of Bioengineering, University of California, Los Angeles (UCLA), CA, USA
- Department of Molecular Biosciences, Northwestern University, Evanston, IL, USA
| | - Armaan Abraham
- Department of Bioengineering, University of California, Los Angeles (UCLA), CA, USA
| | | | | | - Aaron S. Meyer
- Department of Bioengineering, University of California, Los Angeles (UCLA), CA, USA
- Jonsson Comprehensive Cancer Center, UCLA, CA, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, UCLA, CA, USA
| |
Collapse
|
12
|
Islam MZ, Zimmerman S, Lindahl A, Weidanz J, Ordovas-Montanes J, Kostic A, Luber J, Robben M. Single-cell RNA-seq reveals disease-specific CD8+ T cell clonal expansion and a high frequency of transcriptionally distinct double-negative T cells in diabetic NOD mice. PLoS One 2025; 20:e0317987. [PMID: 40106422 PMCID: PMC11922263 DOI: 10.1371/journal.pone.0317987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 01/08/2025] [Indexed: 03/22/2025] Open
Abstract
T cells primarily drive the autoimmune destruction of pancreatic beta cells in Type 1 diabetes (T1D). However, the profound yet uncharacterized diversity of the T cell populations in vivo has hindered obtaining a clear picture of the T cell changes that occur longitudinally during T1D onset. This study aimed to identify T cell clonal expansion and distinct transcriptomic signatures associated with T1D progression in Non-Obese Diabetic (NOD) mice. Here we profiled the transcriptome and T cell receptor (TCR) repertoire of T cells at single-cell resolution from longitudinally collected peripheral blood and pancreatic islets of NOD mice using single-cell RNA sequencing technology. We detected disease dependent development of infiltrating CD8 + T cells with altered cytotoxic and inflammatory effector states. In addition, we discovered a high frequency of transcriptionally distinct double negative (DN) T cells that fluctuate throughout T1D pathogenesis. This study identifies potential disease relevant TCR sequences and potential disease biomarkers that can be further characterized through future research.
Collapse
Affiliation(s)
- Md Zohorul Islam
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, Massachusetts, United States of America
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, United States of America
- Section of Experimental Animal Models, Department of Veterinary and Animal Sciences, University of Copenhagen, Copenhagen, Denmark
- CSIRO Health & Biosecurity, Australian Centre for Disease Preparedness, Geelong, Victoria, Australia
| | - Sam Zimmerman
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, Massachusetts, United States of America
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Alexis Lindahl
- Department of Animal Science, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Jon Weidanz
- Department of Kinesiology, The University of Texas at Arlington, Texas, United States of America
- Department of Bioengineering, The University of Texas at Arlington, Texas, United States of America
| | - Jose Ordovas-Montanes
- Division of Gastroenterology, Boston Children’s Hospital, Boston, Massachusetts, United States of America
- Harvard Stem Cell Institute, Harvard University, Boston, Massachusetts, United States of America
| | - Aleksandar Kostic
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, Massachusetts, United States of America
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Jacob Luber
- Department of Computer Science and Engineering, The University of Texas at Arlington, United States of America
| | - Michael Robben
- Department of Animal Science, University of Illinois, Urbana-Champaign, Illinois, United States of America
- Department of Computer Science and Engineering, The University of Texas at Arlington, United States of America
| |
Collapse
|
13
|
Andrade AX, Nguyen S, Montillo A. scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder. RESEARCH SQUARE 2025:rs.3.rs-6081478. [PMID: 40166015 PMCID: PMC11957221 DOI: 10.21203/rs.3.rs-6081478/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
scRNA-seq data has the potential to provide new insights into cellular heterogeneity and data acquisition; however, a major challenge is unraveling confounding from technical and biological batch effects. Existing batch correction algorithms suppress and discard these effects, rather than quantifying and modeling them. Here, we present scMEDAL, a framework for s ingle- c ell M ixed E ffects D eep A utoencoder L earning, which separately models batch-invariant and batch-specific effects using two complementary autoencoder networks. One network is trained through adversarial learning to capture a batch-invariant representation, while a Bayesian autoencoder learns a batch-specific representation. Comprehensive evaluations spanning conditions (e.g., autism, leukemia, and cardiovascular), cell types, and technical and biological effects demonstrate that scMEDAL suppresses batch effects while modeling batch-specific variation, enhancing accuracy and interpretability. Unlike prior approaches, the framework's fixed- and random-effects autoencoders enable retrospective analyses, including predicting a cell's expression as if it had been acquired in a different batch via genomap projections at the cellular level, revealing the impact of biological (e.g., diagnosis) and technical (e.g., acquisition) effects. By combining scMEDAL's batch-agnostic and batch-specific latent spaces, it enables more accurate predictions of disease status, donor group, and cell type, making scMEDAL a valuable framework for gaining deeper insight into data acquisition and cellular heterogeneity.
Collapse
Affiliation(s)
- Aixa X. Andrade
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Son Nguyen
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Albert Montillo
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
14
|
Andrade AX, Nguyen S, Montillo A. scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder. ARXIV 2025:arXiv:2411.06635v3. [PMID: 39606715 PMCID: PMC11601787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
scRNA-seq data has the potential to provide new insights into cellular heterogeneity and data acquisition; however, a major challenge is unraveling confounding from technical and biological batch effects. Existing batch correction algorithms suppress and discard these effects, rather than quantifying and modeling them. Here, we present scMEDAL, a framework for single-cell Mixed Effects Deep Autoencoder Learning, which separately models batch-invariant and batch-specific effects using two complementary autoencoder networks. One network is trained through adversarial learning to capture a batch-invariant representation, while a Bayesian autoencoder learns a batch-specific representation. Comprehensive evaluations spanning conditions (e.g., autism, leukemia, and cardiovascular), cell types, and technical and biological effects demonstrate that scMEDAL suppresses batch effects while modeling batch-specific variation, enhancing accuracy and interpretability. Unlike prior approaches, the framework's fixed- and random-effects autoencoders enable retrospective analyses, including predicting a cell's expression as if it had been acquired in a different batch via genomap projections at the cellular level, revealing the impact of biological (e.g., diagnosis) and technical (e.g., acquisition) effects. By combining scMEDAL's batch-agnostic and batch-specific latent spaces, it enables more accurate predictions of disease status, donor group, and cell type, making scMEDAL a valuable framework for gaining deeper insight into data acquisition and cellular heterogeneity.
Collapse
|
15
|
Chen Y, Li F. K-Volume Clustering Algorithms for scRNA-Seq Data Analysis. BIOLOGY 2025; 14:283. [PMID: 40136539 PMCID: PMC11940832 DOI: 10.3390/biology14030283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2025] [Revised: 02/27/2025] [Accepted: 03/06/2025] [Indexed: 03/27/2025]
Abstract
Clustering high-dimensional and structural data remains a key challenge in computational biology, especially for complex single-cell and multi-omics datasets. In this study, we present K-volume clustering, a novel algorithm that uses the total convex volume defined by points within a cluster as a biologically relevant and geometrically interpretable criterion. This method simultaneously optimizes both the hierarchical structure and the number of clusters at each level through nonlinear optimization. Validation on real datasets shows that K-volume clustering outperforms traditional methods across a range of biological applications. With its theoretical foundation and broad applicability, K-volume clustering holds great promise as a core tool for diverse data analysis tasks.
Collapse
Affiliation(s)
- Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA;
| | - Fei Li
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
16
|
Prater KE, Lin KZ. All the single cells: Single-cell transcriptomics/epigenomics experimental design and analysis considerations for glial biologists. Glia 2025; 73:451-473. [PMID: 39558887 PMCID: PMC11809281 DOI: 10.1002/glia.24633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 09/18/2024] [Accepted: 10/10/2024] [Indexed: 11/20/2024]
Abstract
Single-cell transcriptomics, epigenomics, and other 'omics applied at single-cell resolution can significantly advance hypotheses and understanding of glial biology. Omics technologies are revealing a large and growing number of new glial cell subtypes, defined by their gene expression profile. These subtypes have significant implications for understanding glial cell function, cell-cell communications, and glia-specific changes between homeostasis and conditions such as neurological disease. For many, the training in how to analyze, interpret, and understand these large datasets has been through reading and understanding literature from other fields like biostatistics. Here, we provide a primer for glial biologists on experimental design and analysis of single-cell RNA-seq datasets. Our goal is to further the understanding of why decisions are made about datasets and to enhance biologists' ability to interpret and critique their work and the work of others. We review the steps involved in single-cell analysis with a focus on decision points and particular notes for glia. The goal of this primer is to ensure that single-cell 'omics experiments continue to advance glial biology in a rigorous and replicable way.
Collapse
Affiliation(s)
- Katherine E. Prater
- Department of Neurology, University of Washington School of Medicine, Seattle 98195
| | - Kevin Z. Lin
- Department of Biostatistics, University of Washington, Seattle 98195
| |
Collapse
|
17
|
Arbatsky M, Vasilyeva E, Sysoeva V, Semina E, Saveliev V, Rubina K. Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation. FRONTIERS IN BIOINFORMATICS 2025; 5:1519468. [PMID: 40013100 PMCID: PMC11861183 DOI: 10.3389/fbinf.2025.1519468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 01/20/2025] [Indexed: 02/28/2025] Open
Abstract
Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.
Collapse
Affiliation(s)
- Mikhail Arbatsky
- Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
| | - Ekaterina Vasilyeva
- Institute of Higher Technologies, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Veronika Sysoeva
- Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
| | - Ekaterina Semina
- Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
- Institute of Medicine and Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Valeri Saveliev
- Institute of Higher Technologies, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Kseniya Rubina
- Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
18
|
Stanley JS, Yang J, Li R, Lindenbaum O, Kobak D, Landa B, Kluger Y. Principled PCA separates signal from noise in omics count data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.03.636129. [PMID: 39975320 PMCID: PMC11838471 DOI: 10.1101/2025.02.03.636129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Principal component analysis (PCA) is indispensable for processing high-throughput omics datasets, as it can extract meaningful biological variability while minimizing the influence of noise. However, the suitability of PCA is contingent on appropriate normalization and transformation of count data, and accurate selection of the number of principal components; improper choices can result in the loss of biological information or corruption of the signal due to excessive noise. Typical approaches to these challenges rely on heuristics that lack theoretical foundations. In this work, we present Biwhitened PCA (BiPCA), a theoretically grounded framework for rank estimation and data denoising across a wide range of omics modalities. BiPCA overcomes a fundamental difficulty with handling count noise in omics data by adaptively rescaling the rows and columns - a rigorous procedure that standardizes the noise variances across both dimensions. Through simulations and analysis of over 100 datasets spanning seven omics modalities, we demonstrate that BiPCA reliably recovers the data rank and enhances the biological interpretability of count data. In particular, BiPCA enhances marker gene expression, preserves cell neighborhoods, and mitigates batch effects. Our results establish BiPCA as a robust and versatile framework for high-throughput count data analysis.
Collapse
Affiliation(s)
- Jay S. Stanley
- Program in Applied Mathematics, Yale University, New Haven, CT, USA
| | - Junchen Yang
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Ruiqi Li
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Ofir Lindenbaum
- Faculty of Engineering, Bar Ilan University, Ramat-Gan, Israel
| | - Dmitry Kobak
- Hertie Institute for AI in Brain Health, University of Tübingen, Germany
| | - Boris Landa
- Program in Applied Mathematics, Yale University, New Haven, CT, USA
- Department of Electrical and Computer Engineering, Yale University, New Haven, CT, USA
| | - Yuval Kluger
- Program in Applied Mathematics, Yale University, New Haven, CT, USA
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale University, New Haven, CT, USA
| |
Collapse
|
19
|
Sakai SA, Nomura R, Nagasawa S, Chi S, Suzuki A, Suzuki Y, Imai M, Nakamura Y, Yoshino T, Ishikawa S, Tsuchihara K, Kageyama SI, Yamashita R. SpatialKNifeY (SKNY): Extending from spatial domain to surrounding area to identify microenvironment features with single-cell spatial omics data. PLoS Comput Biol 2025; 21:e1012854. [PMID: 39965034 PMCID: PMC11849985 DOI: 10.1371/journal.pcbi.1012854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 02/24/2025] [Accepted: 02/03/2025] [Indexed: 02/20/2025] Open
Abstract
Single-cell spatial omics analysis requires consideration of biological functions and mechanisms in a microenvironment. However, microenvironment analysis using bioinformatic methods is limited by the need to detect histological morphology and extend it to the surrounding area. In this study, we developed SpatialKNifeY (SKNY), an image-processing-based toolkit that detects spatial domains that potentially reflect histology and extends these domains to the microenvironment. Using spatial transcriptomic data from breast cancer, we applied the SKNY algorithm to identify tumor spatial domains, followed by clustering of the domains, trajectory estimation, and spatial extension to the tumor microenvironment (TME). The results of the trajectory estimation were consistent with the known mechanisms of cancer progression. We observed tumor vascularization and immunodeficiency at mid- and late-stage progression in TME. Furthermore, we applied the SKNY to integrate and cluster the spatial domains of 14 patients with metastatic colorectal cancer, and the clusters were divided based on the TME characteristics. In conclusion, the SKNY facilitates the determination of the functions and mechanisms in the microenvironment and cataloguing of the features.
Collapse
Affiliation(s)
- Shunsuke A. Sakai
- Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
- Department of Radiation Oncology, National Cancer Center Hospital East, Kashiwa, Chiba, Japan
| | - Ryosuke Nomura
- Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Satoi Nagasawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
- Department of Breast Surgery, National Cancer Center Hospital East, Kashiwa, Chiba, Japan
| | - SungGi Chi
- Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
| | - Ayako Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Mitsuho Imai
- Translational Research Support Office, National Cancer Center Hospital East, Chiba, Japan
- Department of Genetic Medicine and Services, National Cancer Center Hospital East, Chiba, Japan
| | - Yoshiaki Nakamura
- Translational Research Support Office, National Cancer Center Hospital East, Chiba, Japan
- Department of Gastroenterology and Gastrointestinal Oncology, National Cancer Center Hospital East, Chiba, Japan
| | - Takayuki Yoshino
- Translational Research Support Office, National Cancer Center Hospital East, Chiba, Japan
- Department of Gastroenterology and Gastrointestinal Oncology, National Cancer Center Hospital East, Chiba, Japan
| | - Shumpei Ishikawa
- Department of Preventive Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
- Division of Pathology, National Cancer Center Exploratory Oncology Research & Clinical Trial Center, Kashiwa, Chiba, Japan
| | - Katsuya Tsuchihara
- Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Shun-Ichiro Kageyama
- Department of Radiation Oncology, National Cancer Center Hospital East, Kashiwa, Chiba, Japan
- Division of Radiation Oncology and Particle Therapy, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
| | - Riu Yamashita
- Division of Translational Informatics, Exploratory Oncology Research & Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| |
Collapse
|
20
|
Goss K, Horwitz EM. Single-cell multiomics to advance cell therapy. Cytotherapy 2025; 27:137-145. [PMID: 39530970 DOI: 10.1016/j.jcyt.2024.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/21/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
Single-cell RNA-sequencing (scRNAseq) was first introduced in 2009 and has evolved with many technological advancements over the last decade. Not only are there several scRNAseq platforms differing in many aspects, but there are also a large number of computational pipelines available for downstream analyses which are being developed at an exponential rate. Such computational data appear in many scientific publications in virtually every field of study; thus, investigators should be able to understand and interpret data in this rapidly evolving field. Here, we discuss key differences in scRNAseq platforms, crucial steps in scRNAseq experiments, standard downstream analyses and introduce newly developed multimodal approaches. We then discuss how single-cell omics has been applied to advance the field of cell therapy.
Collapse
Affiliation(s)
- Kyndal Goss
- Marcus Center for Advanced Cellular Therapy, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Aflac Cancer & Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Graduate Division of Biology and Biomedical Sciences, Emory University Laney Graduate School, Atlanta, Georgia, USA
| | - Edwin M Horwitz
- Marcus Center for Advanced Cellular Therapy, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Aflac Cancer & Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, USA; Graduate Division of Biology and Biomedical Sciences, Emory University Laney Graduate School, Atlanta, Georgia, USA.
| |
Collapse
|
21
|
Liu X, Chapple RH, Bennett D, Wright WC, Sanjali A, Culp E, Zhang Y, Pan M, Geeleher P. CSI-GEP: A GPU-based unsupervised machine learning approach for recovering gene expression programs in atlas-scale single-cell RNA-seq data. CELL GENOMICS 2025; 5:100739. [PMID: 39788105 PMCID: PMC11770216 DOI: 10.1016/j.xgen.2024.100739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 11/06/2024] [Accepted: 12/13/2024] [Indexed: 01/12/2025]
Abstract
Exploratory analysis of single-cell RNA sequencing (scRNA-seq) typically relies on hard clustering over two-dimensional projections like uniform manifold approximation and projection (UMAP). However, such methods can severely distort the data and have many arbitrary parameter choices. Methods that can model scRNA-seq data as non-discrete "gene expression programs" (GEPs) can better preserve the data's structure, but currently, they are often not scalable, not consistent across repeated runs, and lack an established method for choosing key parameters. Here, we developed a GPU-based unsupervised learning approach, "consensus and scalable inference of gene expression programs" (CSI-GEP). We show that CSI-GEP can recover ground truth GEPs in real and simulated atlas-scale scRNA-seq datasets, significantly outperforming cutting-edge methods, including GPT-based neural networks. We applied CSI-GEP to a whole mouse brain atlas of 2.2 million cells, disentangling endothelial cell types missed by other methods, and to an integrated scRNA-seq atlas of human tumors and cell lines, discovering mesenchymal-like GEPs unique to cancer cells growing in culture.
Collapse
Affiliation(s)
- Xueying Liu
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Richard H Chapple
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Declan Bennett
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - William C Wright
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Ankita Sanjali
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Erielle Culp
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA; Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Yinwen Zhang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Min Pan
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Paul Geeleher
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
| |
Collapse
|
22
|
Tan CL, Lindner K, Boschert T, Meng Z, Rodriguez Ehrenfried A, De Roia A, Haltenhof G, Faenza A, Imperatore F, Bunse L, Lindner JM, Harbottle RP, Ratliff M, Offringa R, Poschke I, Platten M, Green EW. Prediction of tumor-reactive T cell receptors from scRNA-seq data for personalized T cell therapy. Nat Biotechnol 2025; 43:134-142. [PMID: 38454173 PMCID: PMC11738991 DOI: 10.1038/s41587-024-02161-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 02/01/2024] [Indexed: 03/09/2024]
Abstract
The identification of patient-derived, tumor-reactive T cell receptors (TCRs) as a basis for personalized transgenic T cell therapies remains a time- and cost-intensive endeavor. Current approaches to identify tumor-reactive TCRs analyze tumor mutations to predict T cell activating (neo)antigens and use these to either enrich tumor infiltrating lymphocyte (TIL) cultures or validate individual TCRs for transgenic autologous therapies. Here we combined high-throughput TCR cloning and reactivity validation to train predicTCR, a machine learning classifier that identifies individual tumor-reactive TILs in an antigen-agnostic manner based on single-TIL RNA sequencing. PredicTCR identifies tumor-reactive TCRs in TILs from diverse cancers better than previous gene set enrichment-based approaches, increasing specificity and sensitivity (geometric mean) from 0.38 to 0.74. By predicting tumor-reactive TCRs in a matter of days, TCR clonotypes can be prioritized to accelerate the manufacture of personalized T cell therapies.
Collapse
Affiliation(s)
- C L Tan
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - K Lindner
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany
| | - T Boschert
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Helmholtz Institute for Translational Oncology, Mainz, Germany
| | - Z Meng
- Department of General, Visceral and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
- Sino-German Laboratory of Personalized Medicine for Pancreatic Cancer, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - A Rodriguez Ehrenfried
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Helmholtz Institute for Translational Oncology, Mainz, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
| | - A De Roia
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- DNA Vector Laboratory, German Cancer Research Center, Heidelberg, Germany
| | - G Haltenhof
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
| | | | | | - L Bunse
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
| | | | - R P Harbottle
- DNA Vector Laboratory, German Cancer Research Center, Heidelberg, Germany
| | - M Ratliff
- Department of Neurosurgery, University Hospital Mannheim, Mannheim, Germany
| | - R Offringa
- Department of General, Visceral and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
- Sino-German Laboratory of Personalized Medicine for Pancreatic Cancer, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - I Poschke
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany
| | - M Platten
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany.
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany.
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany.
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany.
- Helmholtz Institute for Translational Oncology, Mainz, Germany.
- German Cancer Research Center-Hector Cancer Institute at the Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.
| | - E W Green
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany.
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany.
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany.
| |
Collapse
|
23
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2025; 68:5-102. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
24
|
Lodi MK, Clark L, Roy S, Ghosh P. CORTADO: Hill Climbing Optimization for Cell-Type Specific Marker Gene Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.23.630040. [PMID: 39763976 PMCID: PMC11703242 DOI: 10.1101/2024.12.23.630040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) has greatly enhanced our ability to explore cellular heterogeneity with high resolution. Identifying subpopulations of cells and their associated molecular markers is crucial in understanding their distinct roles in tissues. To address the challenges in marker gene selection, we introduce CORTADO, a computational framework based on hill-climbing optimization for the efficient discovery of cell-type-specific markers. CORTADO optimizes three critical properties: differential expression in the clusters of interest, distinctiveness in gene expression profiles to minimize redundancy, and sparseness to ensure a concise and biologically meaningful marker set. Unlike traditional methods that rely on ranking genes by p-values, CORTADO incorporates both differential expression metrics and penalties for overlapping expression profiles, ensuring that each selected marker uniquely represents its cluster while maintaining biological relevance. Its flexibility supports both constrained and unconstrained marker selection, allowing users to specify the number of markers to identify, making it adaptable to diverse analytical needs and scalable to datasets with varying complexities. To validate its performance, we apply CORTADO to several datasets, including the DLPFC 151507 dataset, the Zeisel mouse brain dataset, and a peripheral blood mononuclear cell dataset. Through enrichment analysis and examination of spatial localization-based expression, we demonstrate the robustness of CORTADO in identifying biologically relevant and non-redundant markers in complex datasets. CORTADO provides an efficient and scalable solution for cell-type marker discovery, offering improved sensitivity and specificity compared to existing methods.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Leiliani Clark
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Satyaki Roy
- Department of Mathematical Sciences, University of Alabama in Huntsville, Huntsville, AL, United States of America
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States of America
| |
Collapse
|
25
|
Prompsy P, Saichi M, Raimundo F, Vallot C. IDclust: Iterative clustering for unsupervised identification of cell types with single cell transcriptomics and epigenomics. NAR Genom Bioinform 2024; 6:lqae174. [PMID: 39703425 PMCID: PMC11655290 DOI: 10.1093/nargab/lqae174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 11/08/2024] [Accepted: 11/19/2024] [Indexed: 12/21/2024] Open
Abstract
The increasing diversity of single-cell datasets require systematic cell type characterization. Clustering is a critical step in single-cell analysis, heavily influencing downstream analyses. However, current unsupervised clustering algorithms rely on biologically irrelevant parameters that require manual optimization and fail to capture hierarchical relationships between clusters. We developed IDclust, a framework that identifies clusters with significant biological features at multiple resolutions using biologically meaningful thresholds like fold change, adjusted P-value and fraction of expressing cells. By iteratively processing and clustering subsets of the dataset, IDclust guarantees that all clusters found have significantly different features and stops only when no more interpretable cluster is found. It also creates a hierarchy of clusters, enabling visualization of the hierarchical relationships between different clusters. Analyzing multiple single-cell transcriptomic reference datasets, IDclust achieves superior clustering accuracy compared to state of the art algorithms. We showcase its utility by identifying previously unannotated clusters and identifying branching patterns in scATAC datasets. Using it's unsupervised nature and ability to analyze different -omics, we compare the resolution of different histone marks in multi-omic paired-tag dataset. Overall, IDclust automates single-cell exploration, facilitates cell type annotation and provides a biologically interpretable basis for clustering.
Collapse
Affiliation(s)
- Pacôme Prompsy
- CNRS UMR3244, Institut Curie, PSL Research University, 26 rue d’Ulm, 75005 Paris, France
- Department of Translational Research, Institut Curie, PSL Research University, 26 rue d’Ulm, 75005 Paris, France
- Department of Dermatology, Lausanne University Hospital (CHUV), Avenue de Beaumont 29, 1011Lausanne, Switzerland
- Faculty of Biology and Medicine, University of Lausanne, Rue du Bugnon 46, 1005 Lausanne, Switzerland
| | - Mélissa Saichi
- CNRS UMR3244, Institut Curie, PSL Research University, 26 rue d’Ulm, 75005 Paris, France
- Department of Translational Research, Institut Curie, PSL Research University, 26 rue d’Ulm, 75005 Paris, France
| | - Félix Raimundo
- CNRS UMR3244, Institut Curie, PSL Research University, 26 rue d’Ulm, 75005 Paris, France
- Department of Translational Research, Institut Curie, PSL Research University, 26 rue d’Ulm, 75005 Paris, France
- Department of Genomics and Computational Biology, University of Massachusetts Chan Medical School, 55 N Lake Ave, Worcester, MA 01605, USA
| | - Céline Vallot
- CNRS UMR3244, Institut Curie, PSL Research University, 26 rue d’Ulm, 75005 Paris, France
- Department of Translational Research, Institut Curie, PSL Research University, 26 rue d’Ulm, 75005 Paris, France
| |
Collapse
|
26
|
Bonev B, Castelo-Branco G, Chen F, Codeluppi S, Corces MR, Fan J, Heiman M, Harris K, Inoue F, Kellis M, Levine A, Lotfollahi M, Luo C, Maynard KR, Nitzan M, Ramani V, Satijia R, Schirmer L, Shen Y, Sun N, Green GS, Theis F, Wang X, Welch JD, Gokce O, Konopka G, Liddelow S, Macosko E, Ali Bayraktar O, Habib N, Nowakowski TJ. Opportunities and challenges of single-cell and spatially resolved genomics methods for neuroscience discovery. Nat Neurosci 2024; 27:2292-2309. [PMID: 39627587 PMCID: PMC11999325 DOI: 10.1038/s41593-024-01806-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 09/23/2024] [Indexed: 12/13/2024]
Abstract
Over the past decade, single-cell genomics technologies have allowed scalable profiling of cell-type-specific features, which has substantially increased our ability to study cellular diversity and transcriptional programs in heterogeneous tissues. Yet our understanding of mechanisms of gene regulation or the rules that govern interactions between cell types is still limited. The advent of new computational pipelines and technologies, such as single-cell epigenomics and spatially resolved transcriptomics, has created opportunities to explore two new axes of biological variation: cell-intrinsic regulation of cell states and expression programs and interactions between cells. Here, we summarize the most promising and robust technologies in these areas, discuss their strengths and limitations and discuss key computational approaches for analysis of these complex datasets. We highlight how data sharing and integration, documentation, visualization and benchmarking of results contribute to transparency, reproducibility, collaboration and democratization in neuroscience, and discuss needs and opportunities for future technology development and analysis.
Collapse
Affiliation(s)
- Boyan Bonev
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany
- Physiological Genomics, Biomedical Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gonçalo Castelo-Branco
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Fei Chen
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - M Ryan Corces
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Jean Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Myriam Heiman
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- The Picower Institute for Learning and Memory, MIT, Cambridge, MA, USA
| | - Kenneth Harris
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Manolis Kellis
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ariel Levine
- Spinal Circuits and Plasticity Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
| | - Mo Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Chongyuan Luo
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Vijay Ramani
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, San Francisco, CA, USA
| | - Rahul Satijia
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Lucas Schirmer
- Department of Neurology, Mannheim Center for Translational Neuroscience, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Yin Shen
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Na Sun
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gilad S Green
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Fabian Theis
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Xiao Wang
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Ozgun Gokce
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
- Department of Neurodegenerative Diseases and Geriatric Psychiatry, University Hospital Bonn, Bonn, Germany.
| | - Genevieve Konopka
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX, USA.
- Peter O'Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA.
| | - Shane Liddelow
- Neuroscience Institute, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Neuroscience & Physiology, NYU Grossman School of Medicine, New York, NY, USA.
- Parekh Center for Interdisciplinary Neurology, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Ophthalmology, NYU Grossman School of Medicine, New York, NY, USA.
| | - Evan Macosko
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA.
| | | | - Naomi Habib
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Tomasz J Nowakowski
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, USA.
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
27
|
Chi Y, Marini S, Wang GZ. BrainCellR: A precise cell type nomenclature pipeline for comparative analysis across brain single-cell datasets. Comput Struct Biotechnol J 2024; 23:4306-4314. [PMID: 39687760 PMCID: PMC11648093 DOI: 10.1016/j.csbj.2024.11.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 11/24/2024] [Accepted: 11/25/2024] [Indexed: 12/18/2024] Open
Abstract
Single-cell studies in neuroscience require precise cell type classification and consistent nomenclature that allows for meaningful comparisons across diverse datasets. Current approaches often lack the ability to identify fine-grained cell types and establish standardized annotations at the cluster level, hindering comprehensive understanding of the brain's cellular composition. To facilitate data integration across multiple models and datasets, we designed BrainCellR. This pipeline provides researchers with a powerful and user-friendly tool for efficient cell type classification and nomination from single-cell transcriptomic data. While initially focused on brain studies, BrainCellR is applicable to other tissues with complex cellular compositions. BrainCellR goes beyond conventional classification approaches by incorporating a standardized nomenclature system for cell types at the cluster level. This feature enables consistent and comparable annotations across different studies, promoting data integration and providing deeper insights into the complex cellular landscape of the brain. All documents for BrainCellR, including source code, user manual and tutorials, are freely available at https://github.com/WangLab-SINH/BrainCellR.
Collapse
Affiliation(s)
- Yuhao Chi
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - Guang-Zhong Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
28
|
Liu P, Pan Y, Chang HC, Wang W, Fang Y, Xue X, Zou J, Toothaker JM, Olaloye O, Santiago EG, McCourt B, Mitsialis V, Presicce P, Kallapur SG, Snapper SB, Liu JJ, Tseng GC, Konnikova L, Liu S. Comprehensive evaluation and practical guideline of gating methods for high-dimensional cytometry data: manual gating, unsupervised clustering, and auto-gating. Brief Bioinform 2024; 26:bbae633. [PMID: 39656848 PMCID: PMC11630031 DOI: 10.1093/bib/bbae633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 11/13/2024] [Accepted: 11/25/2024] [Indexed: 12/17/2024] Open
Abstract
Cytometry is an advanced technique for simultaneously identifying and quantifying many cell surface and intracellular proteins at a single-cell resolution. Analyzing high-dimensional cytometry data involves identifying and quantifying cell populations based on their marker expressions. This study provided a quantitative review and comparison of various ways to phenotype cellular populations within the cytometry data, including manual gating, unsupervised clustering, and supervised auto-gating. Six datasets from diverse species and sample types were included in the study, and manual gating with two hierarchical layers was used as the truth for evaluation. For manual gating, results from five researchers were compared to illustrate the gating consistency among different raters. For unsupervised clustering, 23 tools were quantitatively compared in terms of accuracy with the truth and computing cost. While no method outperformed all others, several tools, including PAC-MAN, CCAST, FlowSOM, flowClust, and DEPECHE, generally demonstrated strong performance. For supervised auto-gating methods, four algorithms were evaluated, where DeepCyTOF and CyTOF Linear Classifier performed the best. We further provided practical recommendations on prioritizing gating methods based on different application scenarios. This study offers comprehensive insights for biologists to understand diverse gating methods and choose the best-suited ones for their applications.
Collapse
Affiliation(s)
- Peng Liu
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Yuchen Pan
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX 77030, US
| | - Hung-Ching Chang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Wenjia Wang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Yusi Fang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Xiangning Xue
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Jian Zou
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Jessica M Toothaker
- Department of Immunology, University of Pittsburgh, 5051 Centre Avenue, Pittsburgh, PA 15213, US
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | - Oluwabunmi Olaloye
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | | | - Black McCourt
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | - Vanessa Mitsialis
- Department of Pediatrics, Division of Gastroenterology, Hepatology, and Nutrition, Boston Children’s Hospital and Department of Pediatrics, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
- Department of Medicine, Division of Gastroenterology, Hepatology, and Endoscopy, Brigham & Women’s Hospital and Department of Medicine, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
| | - Pietro Presicce
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
| | - Suhas G Kallapur
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
| | - Scott B Snapper
- Department of Pediatrics, Division of Gastroenterology, Hepatology, and Nutrition, Boston Children’s Hospital and Department of Pediatrics, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
- Department of Medicine, Division of Gastroenterology, Hepatology, and Endoscopy, Brigham & Women’s Hospital and Department of Medicine, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
| | - Jia-Jun Liu
- Drug Discovery Institute, School of Medicine, University of Pittsburgh, 700 Technology Dr, Pittsburgh, PA 15219, US
- Pittsburgh Liver Research Center, School of Medicine, University of Pittsburgh, 200 Lothrop Street, Pittsburgh, PA 15261, US
| | - George C Tseng
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
- Computational and Systems Biology, School of Medicine, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15213, US
| | - Liza Konnikova
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
- Department of Obstetrics, Gynecology and Reproductive Sciences, Yale University, 333 Cedar Street, New Haven, CT 06510, US
- Department of Immunobiology, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Program in Human and Translational Immunology, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Program in Translational Biomedicine, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Center for Systems and Engineering Immunology, Yale University, 100 College St., New Haven, CT 06510, US
| | - Silvia Liu
- Drug Discovery Institute, School of Medicine, University of Pittsburgh, 700 Technology Dr, Pittsburgh, PA 15219, US
- Pittsburgh Liver Research Center, School of Medicine, University of Pittsburgh, 200 Lothrop Street, Pittsburgh, PA 15261, US
- Computational and Systems Biology, School of Medicine, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15213, US
- Department of Pharmacology and Chemical Biology, School of Medicine, University of Pittsburgh, 200 Lothrop St., Pittsburgh, PA 15261, US
- Hillman Cancer Center, University of Pittsburgh, 5150 Centre Ave., Pittsburgh, PA 15232, US
| |
Collapse
|
29
|
Wang T, Tian L, Wei B, Li J, Zhang C, Long R, Zhu X, Zhang Y, Wang B, Tang G, Yang J, Guo Y. Effect of fibroblast heterogeneity on prognosis and drug resistance in high-grade serous ovarian cancer. Sci Rep 2024; 14:26617. [PMID: 39496775 PMCID: PMC11535537 DOI: 10.1038/s41598-024-77630-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 10/23/2024] [Indexed: 11/06/2024] Open
Abstract
Tumor heterogeneity is associated with poor prognosis and drug resistance, leading to therapeutic failure. Here, we used tumor evolution analysis to determine the intra- and intertumoral heterogeneity of high-grade serous ovarian cancer (HGSOC) and analyze the correlation between tumor heterogeneity and prognosis, as well as chemotherapy response, through single-cell and spatial transcriptomic analysis. We collected and curated 28 HGSOC patients' single-cell transcriptomic data from five datasets. Then, we developed a novel text-mining-based machine-learning approach to deconstruct the evolutionary patterns of tumor cell functions. We then identified key tumor-related genes within different evolutionary branches, characterized the microenvironmental cell compositions that various functional tumor cells depend on, and analyzed the intra- and intertumoral heterogeneity as well as the tumor microenvironments. These analyses were conducted in relation to the prognosis and chemotherapy response in HGSOC patients. We validated our findings in two spatial and seven bulk transcriptomic datasets (total: 1,030 patients). Using transcriptomic clusters as proxies for functional clonality, we identified a significant increase in tumor cell state heterogeneity that was strongly correlated with patient prognosis and treatment response. Furthermore, increased intra- and intertumoral functional clonality was associated with the characteristics of cancer-associated fibroblasts (CAFs). The spatial proximity between CXCL12-positive CAFs and tumor cells, mediated through the CXCL12/CXCR4 interaction, was highly positively correlated with poor prognosis and chemotherapy resistance in HGSOC. Finally, we constructed a panel of 24 genes through statistical modeling that correlate with CXCL12-positive fibroblasts and can predict both prognosis and the response to chemotherapy in HGSOC patients. Our study offers insights into the collective behavior of tumor cell communities in HGSOC, as well as potential drivers of tumor evolution in response to therapy. There was a strong association between CXCL12-positive fibroblasts and tumor progression, as well as treatment outcomes.
Collapse
Affiliation(s)
- Tingjie Wang
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan, People's Republic of China
- Henan Key Laboratory of Molecular Pathology, Zhengzhou, People's Republic of China
| | - Lingxi Tian
- MOE Key Laboratory of Intelligent Biomanufacturing, School of Bioengineering, Dalian University of Technology, Dalian, 116024, People's Republic of China
| | - Bing Wei
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan, People's Republic of China
- Henan Key Laboratory of Molecular Pathology, Zhengzhou, People's Republic of China
| | - Jun Li
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan, People's Republic of China
- Henan Key Laboratory of Molecular Pathology, Zhengzhou, People's Republic of China
| | - Cuiyun Zhang
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan, People's Republic of China
- Henan Key Laboratory of Molecular Pathology, Zhengzhou, People's Republic of China
| | - Ruitao Long
- School of Pharmacy, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Xiaofei Zhu
- Department of Clinical Laboratory, The Third Affiliated Hospital of Xinxiang Medical University, Xinxiang, People's Republic of China
- Henan Key Laboratory of Immunology and Targeted Drugs, Xinxiang Key Laboratory of Tumor Microenvironment and Immunotherapy, School of Medical Technology, Xinxiang Medical University, Xinxiang, People's Republic of China
| | - Yougai Zhang
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan, People's Republic of China
- Henan Key Laboratory of Molecular Pathology, Zhengzhou, People's Republic of China
| | - Bo Wang
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan, People's Republic of China
- Henan Key Laboratory of Molecular Pathology, Zhengzhou, People's Republic of China
| | - Guangbo Tang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Jun Yang
- MOE Key Laboratory of Intelligent Biomanufacturing, School of Bioengineering, Dalian University of Technology, Dalian, 116024, People's Republic of China.
| | - Yongjun Guo
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan, People's Republic of China.
- Henan Key Laboratory of Molecular Pathology, Zhengzhou, People's Republic of China.
| |
Collapse
|
30
|
Liu Q, Zhang D, Wang D, Wang G, Wang Y. Automatically Detecting Anchor Cells and Clustering for scRNA-Seq Data Using scTSNN. IEEE J Biomed Health Inform 2024; 28:7015-7027. [PMID: 39283774 DOI: 10.1109/jbhi.2024.3460761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
Advancing in single-cell RNA sequencing techniques enhances the resolution of cell heterogeneity study. Density-based unsupervised clustering has the potential to detect the representative anchor points and the number of clusters automatically. Meanwhile, discovering the true cell type of scRNA-seq data in the unsupervised scenario is still challenging. To this end, we proposed a tensor shared nearest neighbor anchor clustering for scRNA-seq data, named scTSNN, which first makes use of the tensor affinity learning module to mine the local-global balanced topological structures among cells, next designs density-based shared nearest neighbor measurement method to automatically detect anchor cells, finally partitions the non-anchor cells to obtain the clustering results. Validated on synthetic datasets and scRNA-seq datasets, scTSNN not only exactly detects the complicated structures but also has better performance in accuracy and robustness compared with the state-of-the-art methods. Moreover, case studies on mammalian cells and cervical cancer tumor cells demonstrate the selected anchor cells of scTSNN benefit the cell pseudotime inference and rare cell identification, which show good application and research value of scTSNN.
Collapse
|
31
|
Li HS, Tan YT, Zhang XF. Enhancing spatial domain detection in spatial transcriptomics with EnSDD. Commun Biol 2024; 7:1358. [PMID: 39433947 PMCID: PMC11494180 DOI: 10.1038/s42003-024-07001-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 10/01/2024] [Indexed: 10/23/2024] Open
Abstract
Advancements in spatial transcriptomics have transformed our understanding of organ function and tissue microenvironment. However, accurately identifying spatial domains to depict genome heterogeneity and cellular interactions remains a challenge. In this study, we propose EnSDD (Ensemble-learning for Spatial Domain Detection), a method that ingeniously integrates eight state-of-the-art spatial domain detection methods to automatically identify spatial domains. A key innovation of EnSDD is its dynamic weighting mechanism within the ensemble learning process, which optimizes the contribution of each base model and provides a performance evaluation metric without the need for ground truth data. By leveraging the spatial domains identified through EnSDD, we incorporate the detection of domain-specific spatially variable genes and the spatial distribution of cell types, thereby providing deeper insights into tissue heterogeneity. We validate EnSDD across diverse spatial transcriptomics datasets from various tissue organizational structures. Our results demonstrate that EnSDD significantly enhances spatial domain identification accuracy, identifies genes with spatial expression patterns, and reveals domain-specific cell type enrichment patterns, offering invaluable insights into tissue spatial heterogeneity and regionalization.
Collapse
Affiliation(s)
- Hui-Sheng Li
- School of Mathematical Sciences, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Yu-Ting Tan
- School of Mathematics and Statistics, and Hubei Key Lab-Math. Sci., Central China Normal University, Wuhan, 430079, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics, and Hubei Key Lab-Math. Sci., Central China Normal University, Wuhan, 430079, China.
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan, 430079, China.
| |
Collapse
|
32
|
Kitanovski S, Cao Y, Ttoouli D, Farahpour F, Wang J, Hoffmann D. scBubbletree: computational approach for visualization of single cell RNA-seq data. BMC Bioinformatics 2024; 25:302. [PMID: 39271980 PMCID: PMC11401305 DOI: 10.1186/s12859-024-05927-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 09/09/2024] [Indexed: 09/15/2024] Open
Abstract
BACKGROUND Visualization approaches transform high-dimensional data from single cell RNA sequencing (scRNA-seq) experiments into two-dimensional plots that are used for analysis of cell relationships, and as a means of reporting biological insights. Yet, many standard approaches generate visuals that suffer from overplotting, lack of quantitative information, and distort global and local properties of biological patterns relative to the original high-dimensional space. RESULTS We present scBubbletree, a new, scalable method for visualization of scRNA-seq data. The method identifies clusters of cells of similar transcriptomes and visualizes such clusters as "bubbles" at the tips of dendrograms (bubble trees), corresponding to quantitative summaries of cluster properties and relationships. scBubbletree stacks bubble trees with further cluster-associated information in a visually easily accessible way, thus facilitating quantitative assessment and biological interpretation of scRNA-seq data. We demonstrate this with large scRNA-seq data sets, including one with over 1.2 million cells. CONCLUSIONS To facilitate coherent quantification and visualization of scRNA-seq data we developed the R-package scBubbletree, which is freely available as part of the Bioconductor repository at: https://bioconductor.org/packages/scBubbletree/.
Collapse
Affiliation(s)
- Simo Kitanovski
- Bioinformatics and Computational Biophysics, Faculty of Biology and Centre for Medical Biotechnology (ZMB), University of Duisburg-Essen, 45141, Essen, Germany.
| | - Yingying Cao
- Bioinformatics and Computational Biophysics, Faculty of Biology and Centre for Medical Biotechnology (ZMB), University of Duisburg-Essen, 45141, Essen, Germany
| | - Dimitris Ttoouli
- Bioinformatics and Computational Biophysics, Faculty of Biology and Centre for Medical Biotechnology (ZMB), University of Duisburg-Essen, 45141, Essen, Germany
| | - Farnoush Farahpour
- Bioinformatics and Computational Biophysics, Faculty of Biology and Centre for Medical Biotechnology (ZMB), University of Duisburg-Essen, 45141, Essen, Germany
- Institute of Cell Biology (Cancer Research), University Hospital Essen, University of Duisburg-Essen, 45147, Essen, Germany
| | - Jun Wang
- Bioinformatics and Computational Biophysics, Faculty of Biology and Centre for Medical Biotechnology (ZMB), University of Duisburg-Essen, 45141, Essen, Germany
- National Clinical Research Centre for Infectious Diseases, The Third People's Hospital of Shenzhen and The Second Affiliated Hospital of Southern University of Science and Technology, Shenzhen, 518112, Guangdong Province, China
| | - Daniel Hoffmann
- Bioinformatics and Computational Biophysics, Faculty of Biology and Centre for Medical Biotechnology (ZMB), University of Duisburg-Essen, 45141, Essen, Germany.
| |
Collapse
|
33
|
Zhao M, Li J, Liu X, Ma K, Tang J, Guo F. A gene regulatory network-aware graph learning method for cell identity annotation in single-cell RNA-seq data. Genome Res 2024; 34:1036-1051. [PMID: 39134412 PMCID: PMC11368180 DOI: 10.1101/gr.278439.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 07/23/2024] [Indexed: 08/22/2024]
Abstract
Cell identity annotation for single-cell transcriptome data is a crucial process for constructing cell atlases, unraveling pathogenesis, and inspiring therapeutic approaches. Currently, the efficacy of existing methodologies is contingent upon specific data sets. Nevertheless, such data are often sourced from various batches, sequencing technologies, tissues, and even species. Notably, the gene regulatory relationship remains unaffected by the aforementioned factors, highlighting the extensive gene interactions within organisms. Therefore, we propose scHGR, an automated annotation tool designed to leverage gene regulatory relationships in constructing gene-mediated cell communication graphs for single-cell transcriptome data. This strategy helps reduce noise from diverse data sources while establishing distant cellular connections, yielding valuable biological insights. Experiments involving 22 scenarios demonstrate that scHGR precisely and consistently annotates cell identities, benchmarked against state-of-the-art methods. Crucially, scHGR uncovers novel subtypes within peripheral blood mononuclear cells, specifically from CD4+ T cells and cytotoxic T cells. Furthermore, by characterizing a cell atlas comprising 56 cell types for COVID-19 patients, scHGR identifies vital factors like IL1 and calcium ions, offering insights for targeted therapeutic interventions.
Collapse
Affiliation(s)
- Mengyuan Zhao
- College of Computer Science and Control Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100190, China
| | - Jiawei Li
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Xiaoyi Liu
- Computer Science and Engineering, University of South Carolina, Columbia, South Carolina 29208, USA
| | - Ke Ma
- College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Jijun Tang
- College of Computer Science and Control Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
34
|
Yan Y, Zhu S, Jia M, Chen X, Qi W, Gu F, Valencak TG, Liu JX, Sun HZ. Advances in single-cell transcriptomics in animal research. J Anim Sci Biotechnol 2024; 15:102. [PMID: 39090689 PMCID: PMC11295521 DOI: 10.1186/s40104-024-01063-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 06/12/2024] [Indexed: 08/04/2024] Open
Abstract
Understanding biological mechanisms is fundamental for improving animal production and health to meet the growing demand for high-quality protein. As an emerging biotechnology, single-cell transcriptomics has been gradually applied in diverse aspects of animal research, offering an effective method to study the gene expression of high-throughput single cells of different tissues/organs in animals. In an unprecedented manner, researchers have identified cell types/subtypes and their marker genes, inferred cellular fate trajectories, and revealed cell‒cell interactions in animals using single-cell transcriptomics. In this paper, we introduce the development of single-cell technology and review the processes, advancements, and applications of single-cell transcriptomics in animal research. We summarize recent efforts using single-cell transcriptomics to obtain a more profound understanding of animal nutrition and health, reproductive performance, genetics, and disease models in different livestock species. Moreover, the practical experience accumulated based on a large number of cases is highlighted to provide a reference for determining key factors (e.g., sample size, cell clustering, and cell type annotation) in single-cell transcriptomics analysis. We also discuss the limitations and outlook of single-cell transcriptomics in the current stage. This paper describes the comprehensive progress of single-cell transcriptomics in animal research, offering novel insights and sustainable advancements in agricultural productivity and animal health.
Collapse
Affiliation(s)
- Yunan Yan
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Senlin Zhu
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Minghui Jia
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Xinyi Chen
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Wenlingli Qi
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Fengfei Gu
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, Zhejiang University, Hangzhou, 310058, China
| | - Teresa G Valencak
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
- Agency for Health and Food Safety Austria, 1220, Vienna, Austria
| | - Jian-Xin Liu
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Hui-Zeng Sun
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China.
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
35
|
Li J, Shyr Y, Liu Q. aKNNO: single-cell and spatial transcriptomics clustering with an optimized adaptive k-nearest neighbor graph. Genome Biol 2024; 25:203. [PMID: 39090647 PMCID: PMC11293182 DOI: 10.1186/s13059-024-03339-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 07/16/2024] [Indexed: 08/04/2024] Open
Abstract
Typical clustering methods for single-cell and spatial transcriptomics struggle to identify rare cell types, while approaches tailored to detect rare cell types gain this ability at the cost of poorer performance for grouping abundant ones. Here, we develop aKNNO to simultaneously identify abundant and rare cell types based on an adaptive k-nearest neighbor graph with optimization. Benchmarking on 38 simulated and 20 single-cell and spatial transcriptomics datasets demonstrates that aKNNO identifies both abundant and rare cell types more accurately than general and specialized methods. Using only gene expression aKNNO maps abundant and rare cells more precisely compared to integrative approaches.
Collapse
Affiliation(s)
- Jia Li
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
| |
Collapse
|
36
|
Lodi MK, Lodi M, Osei K, Ranganathan V, Hwang P, Ghosh P. CHAI: consensus clustering through similarity matrix integration for cell-type identification. Brief Bioinform 2024; 25:bbae411. [PMID: 39207729 PMCID: PMC11359802 DOI: 10.1093/bib/bbae411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/29/2024] [Accepted: 08/02/2024] [Indexed: 09/04/2024] Open
Abstract
Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell-type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state-of-the-art clustering methods: CHAI-AvgSim and CHAI-SNF. CHAI-AvgSim and CHAI-SNF demonstrate superior performance across several benchmarking datasets. Furthermore, both CHAI methods outperform the most recent consensus clustering method, SAME-clustering. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI overcomes previous limitations by incorporating the most recent and top performing scRNAseq clustering algorithms into the aggregation framework. It is also an intuitive and easily customizable R package where users may add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. This ensures that as more advanced clustering algorithms are developed, CHAI will remain useful to the community as a generalized framework. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Muzammil Lodi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Kezie Osei
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Vaishnavi Ranganathan
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Priscilla Hwang
- Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
37
|
de Winter N, Ji J, Sintou A, Forte E, Lee M, Noseda M, Li A, Koenig AL, Lavine KJ, Hayat S, Rosenthal N, Emanueli C, Srivastava PK, Sattler S. Persistent transcriptional changes in cardiac adaptive immune cells following myocardial infarction: New evidence from the re-analysis of publicly available single cell and nuclei RNA-sequencing data sets. J Mol Cell Cardiol 2024; 192:48-64. [PMID: 38734060 DOI: 10.1016/j.yjmcc.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 03/17/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
INTRODUCTION Chronic immunopathology contributes to the development of heart failure after a myocardial infarction. Both T and B cells of the adaptive immune system are present in the myocardium and have been suggested to be involved in post-MI immunopathology. METHODS We analyzed the B and T cell populations isolated from previously published single cell RNA-sequencing data sets (PMID: 32130914, PMID: 35948637, PMID: 32971526 and PMID: 35926050), of the mouse and human heart, using differential expression analysis, functional enrichment analysis, gene regulatory inferences, and integration with autoimmune and cardiovascular GWAS. RESULTS Already at baseline, mature effector B and T cells are present in the human and mouse heart, having increased activity in transcription factors maintaining tolerance (e.g. DEAF1, JDP2, SPI-B). Following MI, T cells upregulate pro-inflammatory transcript levels (e.g. Cd11, Gzmk, Prf1), while B cells upregulate activation markers (e.g. Il6, Il1rn, Ccl6) and collagen (e.g. Col5a2, Col4a1, Col1a2). Importantly, pro-inflammatory and fibrotic transcription factors (e.g. NFKB1, CREM, REL) remain active in T cells, while B cells maintain elevated activity in transcription factors related to immunoglobulin production (e.g. ERG, REL) in both mouse and human post-MI hearts. Notably, genes differentially expressed in post-MI T and B cells are associated with cardiovascular and autoimmune disease. CONCLUSION These findings highlight the varied and time-dependent dynamic roles of post-MI T and B cells. They appear ready-to-go and are activated immediately after MI, thus participate in the acute wound healing response. However, they subsequently remain in a state of pro-inflammatory activation contributing to persistent immunopathology.
Collapse
Affiliation(s)
- Natasha de Winter
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Jiahui Ji
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Amalia Sintou
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Elvira Forte
- The Jackson Laboratory, Bar Harbor, United States
| | - Michael Lee
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Michela Noseda
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; British Heart Foundation Centre For Research Excellence, Imperial College London, United Kingdom
| | - Aoxue Li
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; Department of Medicine Solna, Division of Cardiovascular Medicine, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Andrew L Koenig
- Center for Cardiovascular Research, Department of Medicine, Cardiovascular Division, Washington University School of Medicine, St. Louis, MO, United States
| | - Kory J Lavine
- Center for Cardiovascular Research, Department of Medicine, Cardiovascular Division, Washington University School of Medicine, St. Louis, MO, United States
| | | | - Nadia Rosenthal
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; The Jackson Laboratory, Bar Harbor, United States
| | - Costanza Emanueli
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; British Heart Foundation Centre For Research Excellence, Imperial College London, United Kingdom
| | - Prashant K Srivastava
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Susanne Sattler
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; Department of Cardiology, Medical University of Graz, Austria; Division of Pharmacology, Otto Loewi Research Center, Medical University of Graz, Austria.
| |
Collapse
|
38
|
Huynh KLA, Tyc KM, Matuck BF, Easter QT, Pratapa A, Kumar NV, Pérez P, Kulchar RJ, Pranzatelli TJ, de Souza D, Weaver TM, Qu X, Soares Junior LAV, Dolhnokoff M, Kleiner DE, Hewitt SM, Ferraz da Silva LF, Rocha VG, Warner BM, Byrd KM, Liu J. Spatial Deconvolution of Cell Types and Cell States at Scale Utilizing TACIT. RESEARCH SQUARE 2024:rs.3.rs-4536158. [PMID: 38978567 PMCID: PMC11230516 DOI: 10.21203/rs.3.rs-4536158/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Identifying cell types and states remains a time-consuming, error-prone challenge for spatial biology. While deep learning is increasingly used, it is difficult to generalize due to variability at the level of cells, neighborhoods, and niches in health and disease. To address this, we developed TACIT, an unsupervised algorithm for cell annotation using predefined signatures that operates without training data. TACIT uses unbiased thresholding to distinguish positive cells from background, focusing on relevant markers to identify ambiguous cells in multiomic assays. Using five datasets (5,000,000-cells; 51-cell types) from three niches (brain, intestine, gland), TACIT outperformed existing unsupervised methods in accuracy and scalability. Integrating TACIT-identified cell types with a novel Shiny app revealed new phenotypes in two inflammatory gland diseases. Finally, using combined spatial transcriptomics and proteomics, we discovered under- and overrepresented immune cell types and states in regions of interest, suggesting multimodality is essential for translating spatial biology to clinical applications.
Collapse
Affiliation(s)
- Khoa L. A. Huynh
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Katarzyna M. Tyc
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond VA, USA
| | - Bruno F. Matuck
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Quinn T. Easter
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Aditya Pratapa
- Department of Cell Biology, Duke University, Durham, NC, USA
| | - Nikhil V. Kumar
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Paola Pérez
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Rachel J. Kulchar
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Thomas J.F. Pranzatelli
- Adeno-Associated Virus Biology Section, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Deiziane de Souza
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR
| | - Theresa M. Weaver
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Xufeng Qu
- Massey Cancer Center, Richmond VA, USA
| | | | - Marisa Dolhnokoff
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR
| | - David E. Kleiner
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen M. Hewitt
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Vanderson Geraldo Rocha
- Department of Hematology, Transfusion and Cell Therapy Service, University of Sao Paulo, Sao Paulo, Brazil
| | - Blake M. Warner
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Kevin M. Byrd
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jinze Liu
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond VA, USA
| |
Collapse
|
39
|
Jia Y, Ma P, Yao Q. CellMarkerPipe: cell marker identification and evaluation pipeline in single cell transcriptomes. Sci Rep 2024; 14:13151. [PMID: 38849445 PMCID: PMC11161599 DOI: 10.1038/s41598-024-63492-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/29/2024] [Indexed: 06/09/2024] Open
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe ( https://github.com/yao-laboratory/cellMarkerPipe ), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
Affiliation(s)
- Yinglu Jia
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
- Department of Chemistry, University of Nebraska Lincoln, Hamilton Hall, Lincoln, NE, 68588, USA
| | - Pengchong Ma
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
| | - Qiuming Yao
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA.
- Nebraska Center for the Prevention of Obesity Diseases, 316C Leverton Hall, Lincoln, NE, 68583, USA.
- Nebraska Center for Virology, University of Nebraska, 4240 Fair St., Lincoln, NE, 68583, USA.
| |
Collapse
|
40
|
Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024; 25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Collapse
Affiliation(s)
- Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
| | - Yang Lan
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
| | - Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
| | - Yingxue Xiao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Jing Sun
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Lei Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Mingwei Liu
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| |
Collapse
|
41
|
Huynh KLA, Tyc KM, Matuck BF, Easter QT, Pratapa A, Kumar NV, Pérez P, Kulchar R, Pranzatelli T, de Souza D, Weaver TM, Qu X, Valente Soares LA, Dolhnokoff M, Kleiner DE, Hewitt SM, da Silva LFF, Rocha VG, Warner BM, Byrd KM, Liu J. Spatial Deconvolution of Cell Types and Cell States at Scale Utilizing TACIT. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596861. [PMID: 38895230 PMCID: PMC11185514 DOI: 10.1101/2024.05.31.596861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Identifying cell types and states remains a time-consuming and error-prone challenge for spatial biology. While deep learning is increasingly used, it is difficult to generalize due to variability at the level of cells, neighborhoods, and niches in health and disease. To address this, we developed TACIT, an unsupervised algorithm for cell annotation using predefined signatures that operates without training data, using unbiased thresholding to distinguish positive cells from background, focusing on relevant markers to identify ambiguous cells in multiomic assays. Using five datasets (5,000,000-cells; 51-cell types) from three niches (brain, intestine, gland), TACIT outperformed existing unsupervised methods in accuracy and scalability. Integration of TACIT-identified cell with a novel Shiny app revealed new phenotypes in two inflammatory gland diseases. Finally, using combined spatial transcriptomics and proteomics, we discover under- and overrepresented immune cell types and states in regions of interest, suggesting multimodality is essential for translating spatial biology to clinical applications.
Collapse
Affiliation(s)
- Khoa L. A. Huynh
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Katarzyna M. Tyc
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond VA, USA
| | - Bruno F. Matuck
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Quinn T. Easter
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Aditya Pratapa
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Nikhil V. Kumar
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Paola Pérez
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Rachel Kulchar
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Thomas Pranzatelli
- Adeno-Associated Virus Biology Section, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Deiziane de Souza
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR
| | - Theresa M. Weaver
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Xufeng Qu
- Massey Cancer Center, Richmond VA, USA
| | | | - Marisa Dolhnokoff
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR
| | - David E. Kleiner
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen M. Hewitt
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Vanderson Geraldo Rocha
- Department of Hematology, Transfusion and Cell Therapy Service, University of Sao Paulo, Sao Paulo, Brazil
| | - Blake M. Warner
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Kevin M. Byrd
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jinze Liu
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond VA, USA
| |
Collapse
|
42
|
Cai L, Anastassiou D. CASCC: a co-expression-assisted single-cell RNA-seq data clustering method. Bioinformatics 2024; 40:btae283. [PMID: 38662553 PMCID: PMC11091742 DOI: 10.1093/bioinformatics/btae283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/28/2024] [Accepted: 04/23/2024] [Indexed: 05/15/2024] Open
Abstract
SUMMARY Existing clustering methods for characterizing cell populations from single-cell RNA sequencing are constrained by several limitations stemming from the fact that clusters often cannot be homogeneous, particularly for transitioning populations. On the other hand, dominant cell populations within samples can be identified independently by their strong gene co-expression signatures using methods unrelated to partitioning. Here, we introduce a clustering method, CASCC (co-expression-assisted single-cell clustering), designed to improve biological accuracy using gene co-expression features identified using an unsupervised adaptive attractor algorithm. CASCC outperformed other methods as evidenced by multiple evaluation metrics, and our results suggest that CASCC can improve the analysis of single-cell transcriptomics, enabling potential new discoveries related to underlying biological mechanisms. AVAILABILITY AND IMPLEMENTATION The CASCC R package is publicly available at https://github.com/LingyiC/CASCC and https://zenodo.org/doi/10.5281/zenodo.10648327.
Collapse
Affiliation(s)
- Lingyi Cai
- Department of Systems Biology, Columbia University, New York, NY 10032, United States
- Department of Electrical Engineering, Columbia University, New York, NY 10027, United States
| | - Dimitris Anastassiou
- Department of Systems Biology, Columbia University, New York, NY 10032, United States
- Department of Electrical Engineering, Columbia University, New York, NY 10027, United States
- Irving Comprehensive Cancer Center, Columbia University, New York, NY 10032, United States
| |
Collapse
|
43
|
Tadi AA, Alhadidi D, Rueda L. PPPCT: Privacy-Preserving framework for Parallel Clustering Transcriptomics data. Comput Biol Med 2024; 173:108351. [PMID: 38520921 DOI: 10.1016/j.compbiomed.2024.108351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/18/2024] [Accepted: 03/18/2024] [Indexed: 03/25/2024]
Abstract
Single-cell transcriptomics data provides crucial insights into patients' health, yet poses significant privacy concerns. Genomic data privacy attacks can have deep implications, encompassing not only the patients' health information but also extending widely to compromise their families'. Moreover, the permanence of leaked data exacerbates the challenges, making retraction an impossibility. While extensive efforts have been directed towards clustering single-cell transcriptomics data, addressing critical challenges, especially in the realm of privacy, remains pivotal. This paper introduces an efficient, fast, privacy-preserving approach for clustering single-cell RNA-sequencing (scRNA-seq) datasets. The key contributions include ensuring data privacy, achieving high-quality clustering, accommodating the high dimensionality inherent in the datasets, and maintaining reasonable computation time for big-scale datasets. Our proposed approach utilizes the map-reduce scheme to parallelize clustering, addressing intensive calculation challenges. Intel Software Guard eXtension (SGX) processors are used to ensure the security of sensitive code and data during processing. Additionally, the approach incorporates a logarithm transformation as a preprocessing step, employs non-negative matrix factorization for dimensionality reduction, and utilizes parallel k-means for clustering. The approach fully leverages the computing capabilities of all processing resources within a secure private cloud environment. Experimental results demonstrate the efficacy of our approach in preserving patient privacy while surpassing state-of-the-art methods in both clustering quality and computation time. Our method consistently achieves a minimum of 7% higher Adjusted Rand Index (ARI) than existing approaches, contingent on dataset size. Additionally, due to parallel computations and dimensionality reduction, our approach exhibits efficiency, converging to very good results in less than 10 seconds for a scRNA-seq dataset with 5000 genes and 6000 cells when prioritizing privacy and under two seconds without privacy considerations. Availability and implementation Code and datasets availability: https://github.com/University-of-Windsor/PPPCT.
Collapse
Affiliation(s)
- Ali Abbasi Tadi
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada.
| | - Dima Alhadidi
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada
| | - Luis Rueda
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada
| |
Collapse
|
44
|
An S, Shi J, Liu R, Chen Y, Wang J, Hu S, Xia X, Dong G, Bo X, He Z, Ying X. scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and Dirichlet process mixture model. Bioinformatics 2024; 40:btae198. [PMID: 38603616 PMCID: PMC11256937 DOI: 10.1093/bioinformatics/btae198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 03/20/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging. RESULTS Here, we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with 15 widely used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes. AVAILABILITY AND IMPLEMENTATION The scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.
Collapse
Affiliation(s)
- Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Xinyu Xia
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| |
Collapse
|
45
|
Malagoli G, Valle F, Barillot E, Caselle M, Martignetti L. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach. Cancers (Basel) 2024; 16:1350. [PMID: 38611028 PMCID: PMC11011054 DOI: 10.3390/cancers16071350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 03/25/2024] [Accepted: 03/28/2024] [Indexed: 04/14/2024] Open
Abstract
Topic modeling is a popular technique in machine learning and natural language processing, where a corpus of text documents is classified into themes or topics using word frequency analysis. This approach has proven successful in various biological data analysis applications, such as predicting cancer subtypes with high accuracy and identifying genes, enhancers, and stable cell types simultaneously from sparse single-cell epigenomics data. The advantage of using a topic model is that it not only serves as a clustering algorithm, but it can also explain clustering results by providing word probability distributions over topics. Our study proposes a novel topic modeling approach for clustering single cells and detecting topics (gene signatures) in single-cell datasets that measure multiple omics simultaneously. We applied this approach to examine the transcriptional heterogeneity of luminal and triple-negative breast cancer cells using patient-derived xenograft models with acquired resistance to chemotherapy and targeted therapy. Through this approach, we identified protein-coding genes and long non-coding RNAs (lncRNAs) that group thousands of cells into biologically similar clusters, accurately distinguishing drug-sensitive and -resistant breast cancer types. In comparison to standard state-of-the-art clustering analyses, our approach offers an optimal partitioning of genes into topics and cells into clusters simultaneously, producing easily interpretable clustering outcomes. Additionally, we demonstrate that an integrative clustering approach, which combines the information from mRNAs and lncRNAs treated as disjoint omics layers, enhances the accuracy of cell classification.
Collapse
Affiliation(s)
- Gabriele Malagoli
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Filippo Valle
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Emmanuel Barillot
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
| | - Michele Caselle
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Loredana Martignetti
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
| |
Collapse
|
46
|
Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. Brief Bioinform 2024; 25:bbae216. [PMID: 38725155 PMCID: PMC11082074 DOI: 10.1093/bib/bbae216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/01/2024] [Accepted: 04/25/2024] [Indexed: 05/13/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics; however, researchers still encounter challenges in their analysis due to uncertainty with respect to selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a novel framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort evaluates the suitability of trajectory analysis and the combined effects of processing choices using trajectory-specific metrics. Escort navigates single-cell trajectory analysis through these data-driven assessments, reducing uncertainty and much of the decision burden inherent to trajectory inference analyses. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Jack R Leary
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Chuanhao Yang
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Maigan A Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Todd M Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, United States
- Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Rhonda Bacher
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
| |
Collapse
|
47
|
Lodi MK, Lodi M, Osei K, Ranganathan V, Hwang P, Ghosh P. CHAI: Consensus Clustering Through Similarity Matrix Integration for Cell-Type Identification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585758. [PMID: 38562750 PMCID: PMC10983883 DOI: 10.1101/2024.03.19.585758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state of the art clustering methods: CHAI-AvgSim and CHAI-SNF. Both methods demonstrate improved performance on a diverse selection of benchmarking datasets, besides also outperforming a previous consensus clustering method. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI is intuitive and easily customizable; it provides a way for users to add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA 23284
| | - Muzammil Lodi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284
| | - Kezie Osei
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284
| | | | - Priscilla Hwang
- Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA 23284
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284
| |
Collapse
|
48
|
Nwizu C, Hughes M, Ramseier ML, Navia AW, Shalek AK, Fusi N, Raghavan S, Winter PS, Amini AP, Crawford L. Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.11.579839. [PMID: 38405697 PMCID: PMC10888887 DOI: 10.1101/2024.02.11.579839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Clustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and complexity to bioinformatic workflows; second, they rely on post-selective differential expression analyses to identify marker genes driving cluster differences, which has been shown to be subject to inflated false discovery rates. We address these challenges by introducing nonparametric clustering of single-cell populations (NCLUSION): an infinite mixture model that leverages Bayesian sparse priors to identify marker genes while simultaneously performing clustering on single-cell expression data. NCLUSION uses a scalable variational inference algorithm to perform these analyses on datasets with up to millions of cells. By analyzing publicly available scRNA-seq studies, we demonstrate that NCLUSION (i) matches the performance of other state-of-the-art clustering techniques with significantly reduced runtime and (ii) provides statistically robust and biologically relevant transcriptomic signatures for each of the clusters it identifies. Overall, NCLUSION represents a reliable hypothesis-generating tool for understanding patterns of expression variation present in single-cell populations.
Collapse
Affiliation(s)
- Chibuikem Nwizu
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Warren Alpert Medical School of Brown University, Providence, RI, USA
| | | | - Michelle L. Ramseier
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrew W. Navia
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Alex K. Shalek
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| | | | - Srivatsan Raghavan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Peter S. Winter
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Microsoft Research, Cambridge, MA, USA
- Department of Biostatistics, Brown University, Providence, RI, USA
| |
Collapse
|
49
|
He D, Mount SM, Patro R. scCensus: Off-target scRNA-seq reads reveal meaningful biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577807. [PMID: 38352549 PMCID: PMC10862729 DOI: 10.1101/2024.01.29.577807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Single-cell RNA-sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity. Although scRNA-seq reads from most prevalent and popular tagged-end protocols are expected to arise from the 3' end of polyadenylated RNAs, recent studies have shown that "off-target" reads can constitute a substantial portion of the read population. In this work, we introduced scCensus, a comprehensive analysis workflow for systematically evaluating and categorizing off-target reads in scRNA-seq. We applied scCensus to seven scRNA-seq datasets. Our analysis of intergenic reads shows that these off-target reads contain information about chromatin structure and can be used to identify similar cells across modalities. Our analysis of antisense reads suggests that these reads can be used to improve gene detection and capture interesting transcriptional activities like antisense transcription. Furthermore, using splice-aware quantification, we find that spliced and unspliced reads provide distinct information about cell clusters and biomarkers, suggesting the utility of integrating signals from reads with different splicing statuses. Overall, our results suggest that off-target scRNA-seq reads contain underappreciated information about various transcriptional activities. These observations about yet-unexploited information in existing scRNA-seq data will help guide and motivate the community to improve current algorithms and analysis methods, and to develop novel approaches that utilize off-target reads to extend the reach and accuracy of single-cell data analysis pipelines.
Collapse
Affiliation(s)
- Dongze He
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, USA
| | - Stephen M. Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
50
|
Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.572214. [PMID: 38187768 PMCID: PMC10769271 DOI: 10.1101/2023.12.18.572214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics, however researchers still encounter challenges in their analysis due to uncertainties in selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort navigates single-cell trajectory analysis through data-driven assessments, reducing uncertainty and much of the decision burden associated with trajectory inference. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
| | - Jack R. Leary
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
| | - Chuanhao Yang
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
| | - Maigan A. Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, USA
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Todd M. Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, USA
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA
- Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Rhonda Bacher
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
- Diabetes Institute, University of Florida, Gainesville, FL 32610, USA
| |
Collapse
|