1
|
Zhu B, Chen P, Aminu M, Li JR, Fujimoto J, Tian Y, Hong L, Chen H, Hu X, Li C, Vokes N, Moreira AL, Gibbons DL, Solis Soto LM, Parra Cuentas ER, Shi O, Diao S, Ye J, Rojas FR, Vilar E, Maitra A, Chen K, Navin N, Nilsson M, Huang B, Heeke S, Zhang J, Haymaker CL, Velcheti V, Sterman DH, Kochat V, Padron WI, Alexandrov LB, Wei Z, Le X, Wang L, Fukuoka J, Lee JJ, Wistuba II, Pass HI, Davis M, Hanash S, Cheng C, Dubinett S, Spira A, Rai K, Lippman SM, Futreal PA, Heymach JV, Reuben A, Wu J, Zhang J. Spatial and multiomics analysis of human and mouse lung adenocarcinoma precursors reveals TIM-3 as a putative target for precancer interception. Cancer Cell 2025:S1535-6108(25)00162-X. [PMID: 40345189 DOI: 10.1016/j.ccell.2025.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 12/31/2024] [Accepted: 04/08/2025] [Indexed: 05/11/2025]
Abstract
How tumor microenvironment shapes lung adenocarcinoma (LUAD) precancer evolution remains poorly understood. Spatial immune profiling of 114 human LUAD and LUAD precursors reveals a progressive increase of adaptive response and a relative decrease of innate immune response as LUAD precursors progress. The immune evasion features align the immune response patterns at various stages. TIM-3-high features are enriched in LUAD precancers, which decrease in later stages. Furthermore, single-cell RNA sequencing (scRNA-seq) and spatial immune and transcriptomics profiling of LUAD and LUAD precursor specimens from 5 mouse models validate high TIM-3 features in LUAD precancers. In vivo TIM-3 blockade at precancer stage, but not at advanced cancer stage, decreases tumor burden. Anti-TIM-3 treatment is associated with enhanced antigen presentation, T cell activation, and increased M1/M2 macrophage ratio. These results highlight the coordination of innate and adaptive immune response/evasion during LUAD precancer evolution and suggest TIM-3 as a potential target for LUAD precancer interception.
Collapse
Affiliation(s)
- Bo Zhu
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Pingjun Chen
- Institute for Data Science in Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Muhammad Aminu
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jian-Rong Li
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Junya Fujimoto
- Clinical Research Center in Hiroshima, Hiroshima University Hospital, Hiroshima, Japan
| | - Yanhua Tian
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Lingzhi Hong
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Hong Chen
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Xin Hu
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Chenyang Li
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Natalie Vokes
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Andre L Moreira
- Department of Pathology, NYU Langone Health, New York, NY, USA
| | - Don L Gibbons
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Luisa M Solis Soto
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Edwin Roger Parra Cuentas
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ou Shi
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Songhui Diao
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jie Ye
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Frank R Rojas
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Eduardo Vilar
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Anirban Maitra
- Department of Translational Molecular Pathology and Sheikn Ahmed Center for Pancreatic Cancer Research, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Nicolas Navin
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Monique Nilsson
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Beibei Huang
- Department of Cancer Systems Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Simon Heeke
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jianhua Zhang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Cara L Haymaker
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Vamsidhar Velcheti
- Department of Medicine, NYU Grossman School of Medicine, New York, NY, USA
| | - Daniel H Sterman
- Department of Medicine, NYU Grossman School of Medicine, New York, NY, USA; Cardiothoracic Surgery, NYU Grossman School of Medicine, New York, NY, USA
| | - Veena Kochat
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - William I Padron
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, USA
| | - Zhubo Wei
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Xiuning Le
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Linghua Wang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Junya Fukuoka
- Department of Pathology Informatics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
| | - J Jack Lee
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ignacio I Wistuba
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Harvey I Pass
- Department of Cardiothoracic Surgery, NYU Langone Health, New York, NY, USA
| | - Mark Davis
- Institute of Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - Samir Hanash
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, CA, USA
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Steven Dubinett
- Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Avrum Spira
- Pathology & Laboratory Medicine, and Bioinformatics, Boston University, Boston, MA, USA
| | - Kunal Rai
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - P Andrew Futreal
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - John V Heymach
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Alexandre Reuben
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jia Wu
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jianjun Zhang
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
2
|
Harada A, Yasumizu Y, Harada T, Fumoto K, Sato A, Maehara N, Sada R, Matsumoto S, Nishina T, Takeda K, Morii E, Kayama H, Kikuchi A. Hypoxia-induced Wnt5a-secreting fibroblasts promote colon cancer progression. Nat Commun 2025; 16:3653. [PMID: 40246836 PMCID: PMC12006413 DOI: 10.1038/s41467-025-58748-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 03/31/2025] [Indexed: 04/19/2025] Open
Abstract
Wnt5a, a representative Wnt ligand that activates the β-catenin-independent pathway, has been shown to promote tumorigenesis. However, it is unclear where Wnt5a is produced and how it affects colon cancer aggressiveness. In this study, we demonstrate that Wnt5a is expressed in fibroblasts near the luminal side of the tumor, and its depletion suppresses mouse colon cancer formation. To characterize the specific fibroblast subtype, a meta-analysis of human and mouse colon fibroblast single-cell RNA-seq data is performed. The results show that Wnt5a is expressed in hypoxia-induced inflammatory fibroblast (InfFib), accompanied by the activation of HIF2. Moreover, Wnt5a maintains InfFib through the suppression of angiogenesis mediated by soluble VEGF receptor1 (Flt1) secretion from endothelial cells, thereby inducing further hypoxia. InfFib also produces epiregulin, which promotes colon cancer growth. Here, we show that Wnt5a acts on endothelial cells, inducing a hypoxic environment that maintains InfFib, thereby contributing to colon cancer progression through InfFib.
Collapse
Affiliation(s)
- Akikazu Harada
- Center for Infectious Disease Education and Research (CiDER), The University of Osaka, Suita, Osaka, Japan.
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), The University of Osaka, Suita, Osaka, Japan.
- Department of Molecular Biology and Biochemistry, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan.
| | - Yoshiaki Yasumizu
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), The University of Osaka, Suita, Osaka, Japan
- Laboratory of Experimental Immunology, WPI Frontier Immunology Research Center, The University of Osaka, Suita, Osaka, Japan
| | - Takeshi Harada
- Department of Molecular Biology and Biochemistry, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
| | - Katsumi Fumoto
- Department of Molecular Biology and Biochemistry, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
| | - Akira Sato
- Department of Molecular Biology and Biochemistry, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
| | - Natsumi Maehara
- Department of Molecular Biology and Biochemistry, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
| | - Ryota Sada
- Center for Infectious Disease Education and Research (CiDER), The University of Osaka, Suita, Osaka, Japan
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), The University of Osaka, Suita, Osaka, Japan
- Department of Molecular Biology and Biochemistry, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
| | - Shinji Matsumoto
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), The University of Osaka, Suita, Osaka, Japan
- Department of Molecular Biology and Biochemistry, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
| | - Takashi Nishina
- Department of Biochemistry, Faculty of Medicine, Toho University, Ota-ku, Tokyo, Japan
| | - Kiyoshi Takeda
- Center for Infectious Disease Education and Research (CiDER), The University of Osaka, Suita, Osaka, Japan
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), The University of Osaka, Suita, Osaka, Japan
- Laboratory of Mucosal Immunology, WPI Frontier Immunology Research Center, The University of Osaka, Suita, Osaka, Japan
- Department of Microbiology and Immunology, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
| | - Eiichi Morii
- Department of Pathology, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
| | - Hisako Kayama
- Laboratory of Mucosal Immunology, WPI Frontier Immunology Research Center, The University of Osaka, Suita, Osaka, Japan
- Department of Microbiology and Immunology, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan
- Institute for Advanced Co-Creation Studies, The University of Osaka, Suita, Osaka, Japan
| | - Akira Kikuchi
- Center for Infectious Disease Education and Research (CiDER), The University of Osaka, Suita, Osaka, Japan.
- Institute for Open and Transdisciplinary Research Initiatives (OTRI), The University of Osaka, Suita, Osaka, Japan.
- Department of Molecular Biology and Biochemistry, Graduate School of Medicine, The University of Osaka, Suita, Osaka, Japan.
| |
Collapse
|
3
|
Pentimalli TM, Karaiskos N, Rajewsky N. Challenges and Opportunities in the Clinical Translation of High-Resolution Spatial Transcriptomics. ANNUAL REVIEW OF PATHOLOGY 2025; 20:405-432. [PMID: 39476415 DOI: 10.1146/annurev-pathmechdis-111523-023417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
Pathology has always been fueled by technological advances. Histology powered the study of tissue architecture at single-cell resolution and remains a cornerstone of clinical pathology today. In the last decade, next-generation sequencing has become informative for the targeted treatment of many diseases, demonstrating the importance of genome-scale molecular information for personalized medicine. Today, revolutionary developments in spatial transcriptomics technologies digitalize gene expression at subcellular resolution in intact tissue sections, enabling the computational analysis of cell types, cellular phenotypes, and cell-cell communication in routinely collected and archival clinical samples. Here we review how such molecular microscopes work, highlight their potential to identify disease mechanisms and guide personalized therapies, and provide guidance for clinical study design. Finally, we discuss remaining challenges to the swift translation of high-resolution spatial transcriptomics technologies and how integration of multimodal readouts and deep learning approaches is bringing us closer to a holistic understanding of tissue biology and pathology.
Collapse
Affiliation(s)
- Tancredi Massimo Pentimalli
- Charité - Universitätsmedizin Berlin, Berlin, Germany
- Laboratory for Systems Biology of Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany; , ,
| | - Nikos Karaiskos
- Laboratory for Systems Biology of Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany; , ,
| | - Nikolaus Rajewsky
- Laboratory for Systems Biology of Regulatory Elements, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany; , ,
- German Center for Cardiovascular Research (DZHK), Berlin, Germany
- Charité - Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Berlin, Germany
- National Center for Tumor Diseases, Berlin, Germany
- NeuroCure Cluster of Excellence, Berlin, Germany
| |
Collapse
|
4
|
Defard T, Desrentes A, Fouillade C, Mueller F. Homebuilt Imaging-Based Spatial Transcriptomics: Tertiary Lymphoid Structures as a Case Example. Methods Mol Biol 2025; 2864:77-105. [PMID: 39527218 DOI: 10.1007/978-1-0716-4184-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Spatial transcriptomics methods provide insight into the cellular heterogeneity and spatial architecture of complex, multicellular systems. Combining molecular and spatial information provides important clues to study tissue architecture in development and disease. Here, we present a comprehensive do-it-yourself (DIY) guide to perform such experiments at reduced costs leveraging open-source approaches. This guide spans the entire life cycle of a project, from its initial definition to experimental choices, wet lab approaches, instrumentation, and analysis. As a concrete example, we focus on tertiary lymphoid structures (TLS), which we use to develop typical questions that can be addressed by these approaches.
Collapse
Affiliation(s)
- Thomas Defard
- Institut Pasteur, Université Paris Cité, Photonic Bio-Imaging, Centre de Ressources et Recherches Technologiques (UTechS-PBI, C2RT), Paris, France
- Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit, Paris, France
- Centre for Computational Biology (CBIO), Mines Paris, PSL University, Paris, France
- Institut Curie, PSL University, Paris, France
- INSERM, U900, Paris, France
| | - Auxence Desrentes
- UMRS1135 Sorbonne University, Paris, France
- INSERM U1135, Paris, France
- Team "Immune Microenvironment and Immunotherapy", Centre for Immunology and Microbial Infections (CIMI), Paris, France
| | - Charles Fouillade
- Institut Curie, Inserm U1021-CNRS UMR 3347, University Paris-Saclay, PSL Research University, Centre Universitaire, Orsay, France
| | - Florian Mueller
- Institut Pasteur, Université Paris Cité, Photonic Bio-Imaging, Centre de Ressources et Recherches Technologiques (UTechS-PBI, C2RT), Paris, France.
- Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit, Paris, France.
| |
Collapse
|
5
|
Lodi MK, Clark L, Roy S, Ghosh P. CORTADO: Hill Climbing Optimization for Cell-Type Specific Marker Gene Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.23.630040. [PMID: 39763976 PMCID: PMC11703242 DOI: 10.1101/2024.12.23.630040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) has greatly enhanced our ability to explore cellular heterogeneity with high resolution. Identifying subpopulations of cells and their associated molecular markers is crucial in understanding their distinct roles in tissues. To address the challenges in marker gene selection, we introduce CORTADO, a computational framework based on hill-climbing optimization for the efficient discovery of cell-type-specific markers. CORTADO optimizes three critical properties: differential expression in the clusters of interest, distinctiveness in gene expression profiles to minimize redundancy, and sparseness to ensure a concise and biologically meaningful marker set. Unlike traditional methods that rely on ranking genes by p-values, CORTADO incorporates both differential expression metrics and penalties for overlapping expression profiles, ensuring that each selected marker uniquely represents its cluster while maintaining biological relevance. Its flexibility supports both constrained and unconstrained marker selection, allowing users to specify the number of markers to identify, making it adaptable to diverse analytical needs and scalable to datasets with varying complexities. To validate its performance, we apply CORTADO to several datasets, including the DLPFC 151507 dataset, the Zeisel mouse brain dataset, and a peripheral blood mononuclear cell dataset. Through enrichment analysis and examination of spatial localization-based expression, we demonstrate the robustness of CORTADO in identifying biologically relevant and non-redundant markers in complex datasets. CORTADO provides an efficient and scalable solution for cell-type marker discovery, offering improved sensitivity and specificity compared to existing methods.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Leiliani Clark
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Satyaki Roy
- Department of Mathematical Sciences, University of Alabama in Huntsville, Huntsville, AL, United States of America
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States of America
| |
Collapse
|
6
|
Kamboj S, Carlson EL, Ander BP, Hanson KL, Murray KD, Fudge JL, Bauman MD, Schumann CM, Fox AS. Translational Insights From Cell Type Variation Across Amygdala Subnuclei in Rhesus Monkeys and Humans. Am J Psychiatry 2024; 181:1086-1102. [PMID: 39473267 DOI: 10.1176/appi.ajp.20230602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/02/2024]
Abstract
OBJECTIVE Theories of amygdala function are central to our understanding of psychiatric and neurodevelopmental disorders. However, limited knowledge of the molecular and cellular composition of the amygdala impedes translational research aimed at developing new treatments and interventions. The aim of this study was to characterize and compare the composition of amygdala cells to help bridge the gap between preclinical models and human psychiatric and neurodevelopmental disorders. METHODS Tissue was dissected from multiple amygdala subnuclei in both humans (N=3, male) and rhesus macaques (N=3, male). Single-nucleus RNA sequencing was performed to characterize the transcriptomes of individual nuclei. RESULTS The results reveal substantial heterogeneity between regions, even when restricted to inhibitory or excitatory neurons. Consistent with previous work, the data highlight the complexities of individual marker genes for uniquely targeting specific cell types. Cross-species analyses suggest that the rhesus monkey model is well-suited to understanding the human amygdala, but also identify limitations. For example, a cell cluster in the ventral lateral nucleus of the amygdala (vLa) is enriched in humans relative to rhesus macaques. Additionally, the data describe specific cell clusters with relative enrichment of disorder-related genes. These analyses point to the human-enriched vLa cell cluster as relevant to autism spectrum disorder, potentially highlighting a vulnerability to neurodevelopmental disorders that has emerged in recent primate evolution. Further, a cluster of cells expressing markers for intercalated cells is enriched for genes reported in human genome-wide association studies of neuroticism, anxiety disorders, and depressive disorders. CONCLUSIONS Together, these findings shed light on the composition of the amygdala and identify specific cell types that can be prioritized in basic science research to better understand human psychopathology and guide the development of potential treatments.
Collapse
Affiliation(s)
- Shawn Kamboj
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| | - Erin L Carlson
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| | - Bradley P Ander
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| | - Kari L Hanson
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| | - Karl D Murray
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| | - Julie L Fudge
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| | - Melissa D Bauman
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| | - Cynthia M Schumann
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| | - Andrew S Fox
- Department of Psychology (Kamboj, Fox), California National Primate Research Center (Kamboj, Bauman, Fox), and MIND Institute (Carlson, Ander, Hanson, Bauman, Schumann), University of California, Davis; Department of Psychiatry and Behavioral Sciences (Carlson, Hanson, Schumann), Department of Neurology (Ander), and Department of Physiology and Membrane Biology (Murray, Bauman), School of Medicine, University of California, Davis; Department of Neuroscience and Department of Psychiatry, School of Medicine and Dentistry, University of Rochester, Rochester, NY (Fudge)
| |
Collapse
|
7
|
Kuemmerle LB, Luecken MD, Firsova AB, Barros de Andrade E Sousa L, Straßer L, Mekki II, Campi F, Heumos L, Shulman M, Beliaeva V, Hediyeh-Zadeh S, Schaar AC, Mahbubani KT, Sountoulidis A, Balassa T, Kovacs F, Horvath P, Piraud M, Ertürk A, Samakovlis C, Theis FJ. Probe set selection for targeted spatial transcriptomics. Nat Methods 2024; 21:2260-2270. [PMID: 39558096 PMCID: PMC11621025 DOI: 10.1038/s41592-024-02496-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/30/2024] [Indexed: 11/20/2024]
Abstract
Targeted spatial transcriptomic methods capture the topology of cell types and states in tissues at single-cell and subcellular resolution by measuring the expression of a predefined set of genes. The selection of an optimal set of probed genes is crucial for capturing the spatial signals present in a tissue. This requires selecting the most informative, yet minimal, set of genes to profile (gene set selection) for which it is possible to build probes (probe design). However, current selections often rely on marker genes, precluding them from detecting continuous spatial signals or new states. We present Spapros, an end-to-end probe set selection pipeline that optimizes both gene set specificity for cell type identification and within-cell type expression variation to resolve spatially distinct populations while considering prior knowledge as well as probe design and expression constraints. We evaluated Spapros and show that it outperforms other selection approaches in both cell type recovery and recovering expression variation beyond cell types. Furthermore, we used Spapros to design a single-cell resolution in situ hybridization on tissues (SCRINSHOT) experiment of adult lung tissue to demonstrate how probes selected with Spapros identify cell types of interest and detect spatial variation even within cell types.
Collapse
Affiliation(s)
- Louis B Kuemmerle
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Institute for Tissue Engineering and Regenerative Medicine, Helmholtz Zentrum München, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Malte D Luecken
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Institute of Lung Health & Immunity, Helmholtz Munich, Member of the German Center for Lung Research (DZL), Munich, Germany
- German Center for Lung Research (DZL), Gießen, Germany
| | - Alexandra B Firsova
- SciLifeLab and Department of Molecular Biosciences, Stockholm University, Stockholm, Sweden
| | | | - Lena Straßer
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | | | - Francesco Campi
- Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany
| | - Lukas Heumos
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Institute of Lung Biology and Disease and Comprehensive Pneumology Center, Helmholtz Zentrum München, German Center for Lung Research (DZL), Munich, Germany
| | - Maiia Shulman
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Valentina Beliaeva
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Soroor Hediyeh-Zadeh
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Anna C Schaar
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Munich Center for Machine Learning, Technical University of Munich, Munich, Germany
| | - Krishnaa T Mahbubani
- Department of Surgery, University of Cambridge and Cambridge NIHR Biomedical Research Centre, Cambridge, UK
| | | | - Tamás Balassa
- Synthetic and Systems Biology Unit, Biological Research Centre, Eötvös Loránd Research Network, Szeged, Hungary
| | | | - Peter Horvath
- Synthetic and Systems Biology Unit, Biological Research Centre, Eötvös Loránd Research Network, Szeged, Hungary
- Institute of AI for Health, Helmholtz Zentrum München, Neuherberg, Germany
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Marie Piraud
- Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany
| | - Ali Ertürk
- Institute for Tissue Engineering and Regenerative Medicine, Helmholtz Zentrum München, Neuherberg, Germany
- Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
- School of Medicine, Koç University, İstanbul, Turkey
| | - Christos Samakovlis
- SciLifeLab and Department of Molecular Biosciences, Stockholm University, Stockholm, Sweden
- Cardiopulmonary Institute, Justus Liebig University, Giessen, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
| |
Collapse
|
8
|
Sun Y, Qiu P. Hierarchical marker genes selection in scRNA-seq analysis. PLoS Comput Biol 2024; 20:e1012643. [PMID: 39666603 PMCID: PMC11637363 DOI: 10.1371/journal.pcbi.1012643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 11/16/2024] [Indexed: 12/14/2024] Open
Abstract
When analyzing scRNA-seq data containing heterogeneous cell populations, an important task is to select informative marker genes to distinguish various cell clusters and annotate the clusters with biologically meaningful cell types. In existing analysis methods and pipelines, marker genes are typically identified using a one-vs-all strategy, examining differential expression between one cell cluster versus the combination of all other cell clusters. However, this strategy applied to cell clusters belonging to closely related cell types often generates overlapping marker genes, which capture the common signature of closely related cell clusters but provide limited information for distinguishing them. To address the limitations of the one-vs-all strategy, we propose a hierarchical marker gene selection strategy that groups similar cell clusters and selects marker genes in a hierarchical manner. This strategy is able to improve the accuracy and interpretability of cell type identification in single-cell RNA-seq data.
Collapse
Affiliation(s)
- Yutong Sun
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Peng Qiu
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, United States of America
| |
Collapse
|
9
|
Liu T, Long W, Cao Z, Wang Y, He CH, Zhang L, Strittmatter SM, Zhao H. CosGeneGate selects multi-functional and credible biomarkers for single-cell analysis. Brief Bioinform 2024; 26:bbae626. [PMID: 39592241 PMCID: PMC11596696 DOI: 10.1093/bib/bbae626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 10/07/2024] [Accepted: 11/14/2024] [Indexed: 11/28/2024] Open
Abstract
MOTIVATION Selecting representative genes or marker genes to distinguish cell types is an important task in single-cell sequencing analysis. Although many methods have been proposed to select marker genes, the genes selected may have redundancy and/or do not show cell-type-specific expression patterns to distinguish cell types. RESULTS Here, we present a novel model, named CosGeneGate, to select marker genes for more effective marker selections. CosGeneGate is inspired by combining the advantages of selecting marker genes based on both cell-type classification accuracy and marker gene specific expression patterns. We demonstrate the better performance of the marker genes selected by CosGeneGate for various downstream analyses than the existing methods with both public datasets and newly sequenced datasets. The non-redundant marker genes identified by CosGeneGate for major cell types and tissues in human can be found at the website as follows: https://github.com/VivLon/CosGeneGate/blob/main/marker gene list.xlsx.
Collapse
Affiliation(s)
- Tianyu Liu
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, United States
| | - Wenxin Long
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16820, United States
| | - Zhiyuan Cao
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, United States
- Program of Health Informatics, Yale University, New Haven, CT, 06520, United States
| | - Yuge Wang
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
| | - Chuan Hua He
- Department of Neurology, Yale University School of Medicine, New Haven, CT, 06520, United States
| | - Le Zhang
- Department of Neurology, Yale University School of Medicine, New Haven, CT, 06520, United States
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06520, United States
| | - Stephen M Strittmatter
- Department of Neurology, Yale University School of Medicine, New Haven, CT, 06520, United States
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06520, United States
- Cellular Neuroscience, Neurodegeneration and Repair Program, Yale University School of Medicine, New Haven, CT, 06520, United States
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, 06520, United States
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, United States
| |
Collapse
|
10
|
Chen L, Guo Z, Deng T, Wu H. scCTS: identifying the cell type-specific marker genes from population-level single-cell RNA-seq. Genome Biol 2024; 25:269. [PMID: 39402623 PMCID: PMC11472465 DOI: 10.1186/s13059-024-03410-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 09/30/2024] [Indexed: 10/19/2024] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) provides gene expression profiles of individual cells from complex samples, facilitating the detection of cell type-specific marker genes. In scRNA-seq experiments with multiple donors, the population level variation brings an extra layer of complexity in cell type-specific gene detection, for example, they may not appear in all donors. Motivated by this observation, we develop a statistical model named scCTS to identify cell type-specific genes from population-level scRNA-seq data. Extensive data analyses demonstrate that the proposed method identifies more biologically meaningful cell type-specific genes compared to traditional methods.
Collapse
Affiliation(s)
- Luxiao Chen
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| | - Zhenxing Guo
- School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-SZ), Shenzhen, 518172, Guangdong, China
| | - Tao Deng
- School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-SZ), Shenzhen, 518172, Guangdong, China
- Shenzhen Research Institute of Big Data, Shenzhen, 518172, China
| | - Hao Wu
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518055, Guangdong, China.
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
11
|
Kong T, Yu T, Zhao J, Hu Z, Xiong N, Wan J, Dong X, Pan Y, Zheng H, Zhang L. scGAA: a general gated axial-attention model for accurate cell-type annotation of single-cell RNA-seq data. Sci Rep 2024; 14:22308. [PMID: 39333739 PMCID: PMC11436728 DOI: 10.1038/s41598-024-73356-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Accepted: 09/17/2024] [Indexed: 09/30/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a key technology for investigating cell development and analysing cell diversity across various diseases. However, the high dimensionality and extreme sparsity of scRNA-seq data pose great challenges for accurate cell type annotation. To address this, we developed a new cell-type annotation model called scGAA (general gated axial-attention model for accurate cell-type annotation of scRNA-seq). Based on the transformer framework, the model decomposes the traditional self-attention mechanism into horizontal and vertical attention, considerably improving computational efficiency. This axial attention mechanism can process high-dimensional data more efficiently while maintaining reasonable model complexity. Additionally, the gated unit was integrated into the model to enhance the capture of relationships between genes, which is crucial for achieving an accurate cell type annotation. The results revealed that our improved transformer model is a promising tool for practical applications. This theoretical innovation increased the model performance and provided new insights into analytical tools for scRNA-seq data.
Collapse
Affiliation(s)
- Tianci Kong
- College of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, 310023, China
| | - Tiancheng Yu
- School of Sciences, Zhejiang University of Science and Technology, Hangzhou, 310023, China
| | - Jiaxin Zhao
- Department of Hepatobiliary and Pancreatic Surgery, Department of Surgery, Fourth Affiliated Hospital, School of Medicine, Zhejiang University, Yiwu, 322000, China
| | - Zhenhua Hu
- Department of Hepatobiliary and Pancreatic Surgery, Department of Surgery, Fourth Affiliated Hospital, School of Medicine, Zhejiang University, Yiwu, 322000, China
| | - Neal Xiong
- Department of Computer Science and Mathematics, Sul Ross State University, Alpine, USA
| | - Jian Wan
- College of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, 310023, China
| | - Xiaoliang Dong
- College of Information Science and Engineering, Shandong Agricultural University, Taian, 271018, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering Shenzhen University of Advanced Technology, Shenzhen, 518118, China
| | - Huilin Zheng
- College of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, 310023, China.
| | - Lei Zhang
- College of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, 310023, China.
- College of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, 310023, China.
| |
Collapse
|
12
|
Chari T, Gorin G, Pachter L. Biophysically interpretable inference of cell types from multimodal sequencing data. NATURE COMPUTATIONAL SCIENCE 2024; 4:677-689. [PMID: 39317762 DOI: 10.1038/s43588-024-00689-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 08/08/2024] [Indexed: 09/26/2024]
Abstract
Multimodal, single-cell genomics technologies enable simultaneous measurement of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell populations, such as regulation of cell fate by transcriptional stochasticity or tumor proliferation through aberrant splicing dynamics. However, current methods for determining cell types or 'clusters' in multimodal data often rely on ad hoc approaches to balance or integrate measurements, and assumptions ignoring inherent properties of the data. To enable interpretable and consistent cell cluster determination, we present meK-means (mechanistic K-means) which integrates modalities through a unifying model of transcription to learn underlying, shared biophysical states. With meK-means we can cluster cells with nascent and mature mRNA measurements, utilizing the causal, physical relationships between these modalities. This identifies shared transcription dynamics across cells, which induce the observed molecule counts, and provides an alternative definition for 'clusters' through the governing parameters of cellular processes.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | | | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
13
|
Ji L, Wang A, Sonthalia S, Naiman DQ, Younes L, Colantuoni C, Geman D. CellCover Captures Neural Stem Cell Progression in Mammalian Neocortical Development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.06.535943. [PMID: 37383947 PMCID: PMC10299349 DOI: 10.1101/2023.04.06.535943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Definition of cell classes across the tissues of living organisms is central in the analysis of growing atlases of single-cell RNA sequencing (scRNA-seq) data across biomedicine. Marker genes for cell classes are most often defined by differential expression (DE) methods that serially assess individual genes across landscapes of diverse cells. This serial approach has been extremely useful, but is limited because it ignores possible redundancy or complementarity across genes that can only be captured by analyzing multiple genes simultaneously. We aim to identify discriminating panels of genes. To efficiently explore the vast space of possible marker panels, leverage the large number of cells often sequenced, and overcome zero-inflation in scRNA-seq data, we propose viewing gene panel selection as a variation of the "minimal set-covering problem" in combinatorial optimization. We show that this new method, CellCover, captures cell-class-specific signals in the developing mouse neocortex that are distinct from those defined by DE methods. Transfer learning experiments across mouse, primate, and human data demonstrate that CellCover identifies markers of conserved cell classes in neurogenesis, as well as temporal progression in both progenitors and neurons. Exploring markers of human outer radial glia (oRG, or basal RG) across mammals, we show that transcriptomic elements of this key cell type in the expansion of the human cortex appeared in gliogenic precursors of the rodent before the full program emerged in the primate lineage. We have assembled the public datasets we use in this report at NeMO analytics where the expression of individual genes {NeMO Individual Genes} and marker gene panels can be freely explored {NeMO: Telley 3 Sets Covering Panels}, {NeMO: Telley 12 Sets Covering Panels}, and {NeMO: Sorted Brain Cell Covering Panels}. CellCover is available in {CellCover R} and {CellCover Python}.
Collapse
|
14
|
Jia Y, Ma P, Yao Q. CellMarkerPipe: cell marker identification and evaluation pipeline in single cell transcriptomes. Sci Rep 2024; 14:13151. [PMID: 38849445 PMCID: PMC11161599 DOI: 10.1038/s41598-024-63492-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/29/2024] [Indexed: 06/09/2024] Open
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe ( https://github.com/yao-laboratory/cellMarkerPipe ), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
Affiliation(s)
- Yinglu Jia
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
- Department of Chemistry, University of Nebraska Lincoln, Hamilton Hall, Lincoln, NE, 68588, USA
| | - Pengchong Ma
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
| | - Qiuming Yao
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA.
- Nebraska Center for the Prevention of Obesity Diseases, 316C Leverton Hall, Lincoln, NE, 68583, USA.
- Nebraska Center for Virology, University of Nebraska, 4240 Fair St., Lincoln, NE, 68583, USA.
| |
Collapse
|
15
|
Wang H, Liu Z, Ma X. Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data. IEEE J Biomed Health Inform 2024; 28:3134-3145. [PMID: 38709615 DOI: 10.1109/jbhi.2024.3370868] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.
Collapse
|
16
|
Pullin JM, McCarthy DJ. A comparison of marker gene selection methods for single-cell RNA sequencing data. Genome Biol 2024; 25:56. [PMID: 38409056 PMCID: PMC10895860 DOI: 10.1186/s13059-024-03183-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 02/07/2024] [Indexed: 02/28/2024] Open
Abstract
BACKGROUND The development of single-cell RNA sequencing (scRNA-seq) has enabled scientists to catalog and probe the transcriptional heterogeneity of individual cells in unprecedented detail. A common step in the analysis of scRNA-seq data is the selection of so-called marker genes, most commonly to enable annotation of the biological cell types present in the sample. In this paper, we benchmark 59 computational methods for selecting marker genes in scRNA-seq data. RESULTS We compare the performance of the methods using 14 real scRNA-seq datasets and over 170 additional simulated datasets. Methods are compared on their ability to recover simulated and expert-annotated marker genes, the predictive performance and characteristics of the gene sets they select, their memory usage and speed, and their implementation quality. In addition, various case studies are used to scrutinize the most commonly used methods, highlighting issues and inconsistencies. CONCLUSIONS Overall, we present a comprehensive evaluation of methods for selecting marker genes in scRNA-seq data. Our results highlight the efficacy of simple methods, especially the Wilcoxon rank-sum test, Student's t-test, and logistic regression.
Collapse
Affiliation(s)
- Jeffrey M Pullin
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, 9 Princes St, Fitzroy, 3065, VIC, Australia
- School of Mathematics and Statistics, University of Melbourne, Parkville, 3010, VIC, Australia
- Melbourne Integrative Genomics, University of Melbourne, Parkville, 3010, VIC, Australia
| | - Davis J McCarthy
- Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, 9 Princes St, Fitzroy, 3065, VIC, Australia.
- School of Mathematics and Statistics, University of Melbourne, Parkville, 3010, VIC, Australia.
- Melbourne Integrative Genomics, University of Melbourne, Parkville, 3010, VIC, Australia.
| |
Collapse
|
17
|
Yafi MA, Hisham MHH, Grisanti F, Martin JF, Rahman A, Samee MAH. scGIST: gene panel design for spatial transcriptomics with prioritized gene sets. Genome Biol 2024; 25:57. [PMID: 38408997 PMCID: PMC10895727 DOI: 10.1186/s13059-024-03185-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 02/14/2024] [Indexed: 02/28/2024] Open
Abstract
A critical challenge of single-cell spatial transcriptomics (sc-ST) technologies is their panel size. Being based on fluorescence in situ hybridization, they are typically limited to panels of about a thousand genes. This constrains researchers to build panels from only the marker genes of different cell types and forgo other genes of interest, e.g., genes encoding ligand-receptor complexes or those in specific pathways. We propose scGIST, a constrained feature selection tool that designs sc-ST panels prioritizing user-specified genes without compromising cell type detection accuracy. We demonstrate scGIST's efficacy in diverse use cases, highlighting it as a valuable addition to sc-ST's algorithmic toolbox.
Collapse
Affiliation(s)
- Mashrur Ahmed Yafi
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Md Hasibul Husain Hisham
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Francisco Grisanti
- Department of Integrative Physiology, Baylor College of Medicine, Houston, 77030, TX, USA
| | - James F Martin
- Department of Integrative Physiology, Baylor College of Medicine, Houston, 77030, TX, USA
- Cardiomyocyte Renewal Laboratory, Texas Heart Institute, Houston, 77030, TX, USA
| | - Atif Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, 77030, TX, USA.
| |
Collapse
|
18
|
Gregory W, Sarwar N, Kevrekidis G, Villar S, Dumitrascu B. MarkerMap: nonlinear marker selection for single-cell studies. NPJ Syst Biol Appl 2024; 10:17. [PMID: 38351188 PMCID: PMC10864304 DOI: 10.1038/s41540-024-00339-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 01/17/2024] [Indexed: 02/16/2024] Open
Abstract
Single-cell RNA-seq data allow the quantification of cell type differences across a growing set of biological contexts. However, pinpointing a small subset of genomic features explaining this variability can be ill-defined and computationally intractable. Here we introduce MarkerMap, a generative model for selecting minimal gene sets which are maximally informative of cell type origin and enable whole transcriptome reconstruction. MarkerMap provides a scalable framework for both supervised marker selection, aimed at identifying specific cell type populations, and unsupervised marker selection, aimed at gene expression imputation and reconstruction. We benchmark MarkerMap's competitive performance against previously published approaches on real single cell gene expression data sets. MarkerMap is available as a pip installable package, as a community resource aimed at developing explainable machine learning techniques for enhancing interpretability in single-cell studies.
Collapse
Affiliation(s)
- Wilson Gregory
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Nabeel Sarwar
- Center for Data Science, New York University, New York, NY, 10012, USA
| | - George Kevrekidis
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Soledad Villar
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Mathematical Institute for Data Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - Bianca Dumitrascu
- Department of Statistics, Columbia University, New York, NY, 10027, USA.
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, 10027, USA.
| |
Collapse
|
19
|
Zhang Y, Petukhov V, Biederstedt E, Que R, Zhang K, Kharchenko PV. Gene panel selection for targeted spatial transcriptomics. Genome Biol 2024; 25:35. [PMID: 38273415 PMCID: PMC10811939 DOI: 10.1186/s13059-024-03174-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 01/12/2024] [Indexed: 01/27/2024] Open
Abstract
Targeted spatial transcriptomics hold particular promise in analyzing complex tissues. Most such methods, however, measure only a limited panel of transcripts, which need to be selected in advance to inform on the cell types or processes being studied. A limitation of existing gene selection methods is their reliance on scRNA-seq data, ignoring platform effects between technologies. Here we describe gpsFISH, a computational method performing gene selection through optimizing detection of known cell types. By modeling and adjusting for platform effects, gpsFISH outperforms other methods. Furthermore, gpsFISH can incorporate cell type hierarchies and custom gene preferences to accommodate diverse design requirements.
Collapse
Affiliation(s)
- Yida Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Neurobiology, Duke University, Durham, NC, USA
| | - Viktor Petukhov
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Evan Biederstedt
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard Que
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Kun Zhang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- San Diego Institute of Science, Altos Labs, San Diego, CA, USA
| | - Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- San Diego Institute of Science, Altos Labs, San Diego, CA, USA.
| |
Collapse
|
20
|
Wei Z, Chenjun W, Feiyang X, Mingfeng J, Yixuan Z, Qi L, Zhuoxing S, Qi D. scHybridBERT: integrating gene regulation and cell graph for spatiotemporal dynamics in single-cell clustering. Brief Bioinform 2024; 25:bbae018. [PMID: 38517692 PMCID: PMC10959234 DOI: 10.1093/bib/bbae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/19/2023] [Accepted: 01/09/2024] [Indexed: 03/24/2024] Open
Abstract
Graph learning models have received increasing attention in the computational analysis of single-cell RNA sequencing (scRNA-seq) data. Compared with conventional deep neural networks, graph neural networks and language models have exhibited superior performance by extracting graph-structured data from raw gene count matrices. Established deep neural network-based clustering approaches generally focus on temporal expression patterns while ignoring inherent interactions at gene-level as well as cell-level, which could be regarded as spatial dynamics in single-cell data. Both gene-gene and cell-cell interactions are able to boost the performance of cell type detection, under the framework of multi-view modeling. In this study, spatiotemporal embedding and cell graphs are extracted to capture spatial dynamics at the molecular level. In order to enhance the accuracy of cell type detection, this study proposes the scHybridBERT architecture to conduct multi-view modeling of scRNA-seq data using extracted spatiotemporal patterns. In this scHybridBERT method, graph learning models are employed to deal with cell graphs and the Performer model employs spatiotemporal embeddings. Experimental outcomes about benchmark scRNA-seq datasets indicate that the proposed scHybridBERT method is able to enhance the accuracy of single-cell clustering tasks by integrating spatiotemporal embeddings and cell graphs.
Collapse
Affiliation(s)
- Zhang Wei
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Wu Chenjun
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Xing Feiyang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, 200092, Shanghai, China
| | | | - Zhang Yixuan
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Liu Qi
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, 200092, Shanghai, China
| | - Shi Zhuoxing
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, 510060, Guangzhou, China
| | - Dai Qi
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| |
Collapse
|
21
|
Yao Q, Jia Y, Ma P. cellMarkerPipe: Cell Marker Identification and Evaluation Pipeline in Single Cell Transcriptomes. RESEARCH SQUARE 2024:rs.3.rs-3844718. [PMID: 38313296 PMCID: PMC10836098 DOI: 10.21203/rs.3.rs-3844718/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe (https://github.com/yao-laboratory/cellMarkerPipe), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
|
22
|
Deng Y, Lu Y, Li M, Shen J, Qin S, Zhang W, Zhang Q, Shen Z, Li C, Jia T, Chen P, Peng L, Chen Y, Zhang W, Liu H, Zhang L, Rong L, Wang X, Chen D. SCAN: Spatiotemporal Cloud Atlas for Neural cells. Nucleic Acids Res 2024; 52:D998-D1009. [PMID: 37930842 PMCID: PMC10767991 DOI: 10.1093/nar/gkad895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/20/2023] [Accepted: 10/05/2023] [Indexed: 11/08/2023] Open
Abstract
The nervous system is one of the most complicated and enigmatic systems within the animal kingdom. Recently, the emergence and development of spatial transcriptomics (ST) and single-cell RNA sequencing (scRNA-seq) technologies have provided an unprecedented ability to systematically decipher the cellular heterogeneity and spatial locations of the nervous system from multiple unbiased aspects. However, efficiently integrating, presenting and analyzing massive multiomic data remains a huge challenge. Here, we manually collected and comprehensively analyzed high-quality scRNA-seq and ST data from the nervous system, covering 10 679 684 cells. In addition, multi-omic datasets from more than 900 species were included for extensive data mining from an evolutionary perspective. Furthermore, over 100 neurological diseases (e.g. Alzheimer's disease, Parkinson's disease, Down syndrome) were systematically analyzed for high-throughput screening of putative biomarkers. Differential expression patterns across developmental time points, cell types and ST spots were discerned and subsequently subjected to extensive interpretation. To provide researchers with efficient data exploration, we created a new database with interactive interfaces and integrated functions called the Spatiotemporal Cloud Atlas for Neural cells (SCAN), freely accessible at http://47.98.139.124:8799 or http://scanatlas.net. SCAN will benefit the neuroscience research community to better exploit the spatiotemporal atlas of the neural system and promote the development of diagnostic strategies for various neurological disorders.
Collapse
Affiliation(s)
- Yushan Deng
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
| | - Yubao Lu
- Department of Spine Surgery, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China
| | - Mengrou Li
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
- Institutes of Biology and Medical Sciences (IBMS), Soochow University, Suzhou 215123, China
| | - Jiayi Shen
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
- Peninsula Cancer Research Center, School of Basic Medical Sciences, Binzhou Medical University, Yantai 264003, China
| | - Siying Qin
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
| | - Wei Zhang
- Department of Spine Surgery, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China
| | - Qiang Zhang
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
| | - Zhaoyang Shen
- Life Sciences and Technology College, China Pharmaceutical University, Nanjing 211198, China
| | - Changxiao Li
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
| | - Tengfei Jia
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
- Institutes of Biology and Medical Sciences (IBMS), Soochow University, Suzhou 215123, China
| | - Peixin Chen
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
- Cam-Su Genomic Resource Center, Medical College of Soochow University, Suzhou 215123, China
| | - Lingmin Peng
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
| | - Yangfeng Chen
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
| | - Wensheng Zhang
- Peninsula Cancer Research Center, School of Basic Medical Sciences, Binzhou Medical University, Yantai 264003, China
- Cam-Su Genomic Resource Center, Medical College of Soochow University, Suzhou 215123, China
| | - Hebin Liu
- Institutes of Biology and Medical Sciences (IBMS), Soochow University, Suzhou 215123, China
| | - Liangming Zhang
- Department of Spine Surgery, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China
| | - Limin Rong
- Department of Spine Surgery, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China
| | - Xiangdong Wang
- Zhongshan Hospital, Department of Pulmonary and Critical Care Medicine, Institute for Clinical Science, Shanghai Institute of Clinical Bioinformatics, Shanghai 200000, China
| | - Dongsheng Chen
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou 215123, China
| |
Collapse
|
23
|
Maden SK, Kwon SH, Huuki-Myers LA, Collado-Torres L, Hicks SC, Maynard KR. Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single-cell RNA-sequencing datasets. Genome Biol 2023; 24:288. [PMID: 38098055 PMCID: PMC10722720 DOI: 10.1186/s13059-023-03123-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Deconvolution of cell mixtures in "bulk" transcriptomic samples from homogenate human tissue is important for understanding disease pathologies. However, several experimental and computational challenges impede transcriptomics-based deconvolution approaches using single-cell/nucleus RNA-seq reference atlases. Cells from the brain and blood have substantially different sizes, total mRNA, and transcriptional activities, and existing approaches may quantify total mRNA instead of cell type proportions. Further, standards are lacking for the use of cell reference atlases and integrative analyses of single-cell and spatial transcriptomics data. We discuss how to approach these key challenges with orthogonal "gold standard" datasets for evaluating deconvolution methods.
Collapse
Affiliation(s)
- Sean K Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Louise A Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Leonardo Collado-Torres
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA.
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA.
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
24
|
Molstad AJ, Motwani K. Multiresolution categorical regression for interpretable cell-type annotation. Biometrics 2023; 79:3485-3496. [PMID: 37798600 DOI: 10.1111/biom.13926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 08/07/2023] [Indexed: 10/07/2023]
Abstract
In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this paper, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. Our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell-type probabilities as a function of a cell's gene expression profile (i.e., cell-type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell-type annotation methodology.
Collapse
Affiliation(s)
- Aaron J Molstad
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Keshav Motwani
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| |
Collapse
|
25
|
Li J, Zhang H, Mu B, Zuo H, Zhou K. Identifying phenotype-associated subpopulations through LP_SGL. Brief Bioinform 2023; 25:bbad424. [PMID: 38008419 PMCID: PMC10753413 DOI: 10.1093/bib/bbad424] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 09/28/2023] [Accepted: 10/31/2023] [Indexed: 11/28/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables the resolution of cellular heterogeneity in diseases and facilitates the identification of novel cell types and subtypes. However, the grouping effects caused by cell-cell interactions are often overlooked in the development of tools for identifying subpopulations. We proposed LP_SGL which incorporates cell group structure to identify phenotype-associated subpopulations by integrating scRNA-seq, bulk expression and bulk phenotype data. Cell groups from scRNA-seq data were obtained by the Leiden algorithm, which facilitates the identification of subpopulations and improves model robustness. LP_SGL identified a higher percentage of cancer cells, T cells and tumor-associated cells than Scissor and scAB on lung adenocarcinoma diagnosis, melanoma drug response and liver cancer survival datasets, respectively. Biological analysis on three original datasets and four independent external validation sets demonstrated that the signaling genes of this cell subset can predict cancer, immunotherapy and survival.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, 46 Jianshe East Road, 453007, Xinxiang, China
| | - Hongmei Zhang
- College of Mathematics and Information Science, Henan Normal University, 46 Jianshe East Road, 453007, Xinxiang, China
| | - Bingyu Mu
- College of Arts and Design, Zhengzhou University of Light Industry, No. 5 Dongfeng Road, 450000, Zhengzhou, China
| | - Hongliang Zuo
- College of Mathematics and Information Science, Henan Normal University, 46 Jianshe East Road, 453007, Xinxiang, China
| | - Kanglei Zhou
- School of Computer Science and Engneering, Beihang University, 37 Xueyuan Road, Haidian District, 100191, Beijing, China
| |
Collapse
|
26
|
Tangherloni A, Riva SG, Myers B, Buffa FM, Cazzaniga P. MAGNETO: Cell type marker panel generator from single-cell transcriptomic data. J Biomed Inform 2023; 147:104510. [PMID: 37797704 DOI: 10.1016/j.jbi.2023.104510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 09/12/2023] [Accepted: 09/29/2023] [Indexed: 10/07/2023]
Abstract
Single-cell RNA sequencing experiments produce data useful to identify different cell types, including uncharacterized and rare ones. This enables us to study the specific functional roles of these cells in different microenvironments and contexts. After identifying a (novel) cell type of interest, it is essential to build succinct marker panels, composed of a few genes referring to cell surface proteins and clusters of differentiation molecules, able to discriminate the desired cells from the other cell populations. In this work, we propose a fully-automatic framework called MAGNETO, which can help construct optimal marker panels starting from a single-cell gene expression matrix and a cell type identity for each cell. MAGNETO builds effective marker panels solving a tailored bi-objective optimization problem, where the first objective regards the identification of the genes able to isolate a specific cell type, while the second conflicting objective concerns the minimization of the total number of genes included in the panel. Our results on three public datasets show that MAGNETO can identify marker panels that identify the cell populations of interest better than state-of-the-art approaches. Finally, by fine-tuning MAGNETO, our results demonstrate that it is possible to obtain marker panels with different specificity levels.
Collapse
Affiliation(s)
- Andrea Tangherloni
- Department of Computing Sciences, Bocconi University, Via Guglielmo Röntgen 1, Milan, 20136, Italy; Bocconi Institute for Data Science and Analytics, Bocconi University, Via Guglielmo Röntgen 1, Milan, 20136, Italy; Department of Human and Social Sciences, University of Bergamo, Piazzale S. Agostino 2, Bergamo, 24129, Italy.
| | - Simone G Riva
- Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Headley Way, Oxford, OX3 9DS, United Kingdom
| | - Brynelle Myers
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, United Kingdom
| | - Francesca M Buffa
- Department of Computing Sciences, Bocconi University, Via Guglielmo Röntgen 1, Milan, 20136, Italy; Bocconi Institute for Data Science and Analytics, Bocconi University, Via Guglielmo Röntgen 1, Milan, 20136, Italy; Department of Oncology, University of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, United Kingdom
| | - Paolo Cazzaniga
- Department of Human and Social Sciences, University of Bergamo, Piazzale S. Agostino 2, Bergamo, 24129, Italy; Bicocca Bioinformatics, Biostatistics, and Bioimaging Centre - B4, Via Follereau 3, Vedano al Lambro, 20854, Italy
| |
Collapse
|
27
|
Chari T, Gorin G, Pachter L. Biophysically Interpretable Inference of Cell Types from Multimodal Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.17.558131. [PMID: 37745403 PMCID: PMC10516047 DOI: 10.1101/2023.09.17.558131] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to the role of stochasticity in driving heterogeneity. However, current methods for determining cell types or 'clusters' present in multimodal data often rely on ad hoc or independent treatment of modalities, and assumptions ignoring inherent properties of the count data. To enable interpretable and consistent cell cluster determination from multimodal data, we present meK-Means (mechanistic K-Means) which integrates modalities and learns underlying, shared biophysical states through a unifying model of transcription. In particular, we demonstrate how meK-Means can be used to cluster cells from unspliced and spliced mRNA count modalities. By utilizing the causal, physical relationships underlying these modalities, we identify shared transcriptional kinetics across cells, which induce the observed gene expression profiles, and provide an alternative definition for 'clusters' through the governing parameters of cellular processes.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Gennady Gorin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
28
|
Abstract
Dimensionality reduction is standard practice for filtering noise and identifying relevant features in large-scale data analyses. In biology, single-cell genomics studies typically begin with reduction to 2 or 3 dimensions to produce "all-in-one" visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative exploratory analysis. However, there is little theoretical support for this practice, and we show that extreme dimension reduction, from hundreds or thousands of dimensions to 2, inevitably induces significant distortion of high-dimensional datasets. We therefore examine the practical implications of low-dimensional embedding of single-cell data and find that extensive distortions and inconsistent practices make such embeddings counter-productive for exploratory, biological analyses. In lieu of this, we discuss alternative approaches for conducting targeted embedding and feature exploration to enable hypothesis-driven biological discovery.
Collapse
Affiliation(s)
- Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America
| |
Collapse
|
29
|
Khodayari S, Khodayari H, Saeedi E, Mahmoodzadeh H, Sadrkhah A, Nayernia K. Single-Cell Transcriptomics for Unlocking Personalized Cancer Immunotherapy: Toward Targeting the Origin of Tumor Development Immunogenicity. Cancers (Basel) 2023; 15:3615. [PMID: 37509276 PMCID: PMC10377122 DOI: 10.3390/cancers15143615] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 07/11/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
Cancer immunotherapy is a promising approach for treating malignancies through the activation of anti-tumor immunity. However, the effectiveness and safety of immunotherapy can be limited by tumor complexity and heterogeneity, caused by the diverse molecular and cellular features of tumors and their microenvironments. Undifferentiated tumor cell niches, which we refer to as the "Origin of Tumor Development" (OTD) cellular population, are believed to be the source of these variations and cellular heterogeneity. From our perspective, the existence of distinct features within the OTD is expected to play a significant role in shaping the unique tumor characteristics observed in each patient. Single-cell transcriptomics is a high-resolution and high-throughput technique that provides insights into the genetic signatures of individual tumor cells, revealing mechanisms of tumor development, progression, and immune evasion. In this review, we explain how single-cell transcriptomics can be used to develop personalized cancer immunotherapy by identifying potential biomarkers and targets specific to each patient, such as immune checkpoint and tumor-infiltrating lymphocyte function, for targeting the OTD. Furthermore, in addition to offering a possible workflow, we discuss the future directions of, and perspectives on, single-cell transcriptomics, such as the development of powerful analytical tools and databases, that will aid in unlocking personalized cancer immunotherapy through the targeting of the patient's cellular OTD.
Collapse
Affiliation(s)
- Saeed Khodayari
- International Center for Personalized Medicine (P7MEDICINE), Luise-Rainer-Str. 6-12, 40235 Düsseldorf, Germany
| | - Hamid Khodayari
- International Center for Personalized Medicine (P7MEDICINE), Luise-Rainer-Str. 6-12, 40235 Düsseldorf, Germany
| | - Elnaz Saeedi
- Oxford Clinical Trials Research Unit, Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), University of Oxford, Oxford OX3 7LD, UK
| | - Habibollah Mahmoodzadeh
- Breast Disease Research Center, Tehran University of Medical Sciences, Tehran 1819613844, Iran
| | | | - Karim Nayernia
- International Center for Personalized Medicine (P7MEDICINE), Luise-Rainer-Str. 6-12, 40235 Düsseldorf, Germany
| |
Collapse
|
30
|
Davalos OA, Heydari AA, Fertig EJ, Sindi SS, Hoyer KK. Boosting Single-Cell RNA Sequencing Analysis with Simple Neural Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.29.542760. [PMID: 37398136 PMCID: PMC10312486 DOI: 10.1101/2023.05.29.542760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
A limitation of current deep learning (DL) approaches for single-cell RNA sequencing (scRNAseq) analysis is the lack of interpretability. Moreover, existing pipelines are designed and trained for specific tasks used disjointly for different stages of analysis. We present scANNA, a novel interpretable DL model for scRNAseq studies that leverages neural attention to learn gene associations. After training, the learned gene importance (interpretability) is used to perform downstream analyses (e.g., global marker selection and cell-type classification) without retraining. ScANNA's performance is comparable to or better than state-of-the-art methods designed and trained for specific standard scRNAseq analyses even though scANNA was not trained for these tasks explicitly. ScANNA enables researchers to discover meaningful results without extensive prior knowledge or training separate task-specific models, saving time and enhancing scRNAseq analyses.
Collapse
Affiliation(s)
- Oscar A. Davalos
- Quantitative and Systems Biology Graduate Program, University of California, Merced, CA, USA
| | - A. Ali Heydari
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Elana J. Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Katrina K. Hoyer
- Health Sciences Research Institute, University of California, Merced, CA, USA
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| |
Collapse
|
31
|
Maden SK, Kwon SH, Huuki-Myers LA, Collado-Torres L, Hicks SC, Maynard KR. Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single cell RNA-sequencing datasets. ARXIV 2023:arXiv:2305.06501v1. [PMID: 37214135 PMCID: PMC10197733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Deconvolution of cell mixtures in "bulk" transcriptomic samples from homogenate human tissue is important for understanding the pathologies of diseases. However, several experimental and computational challenges remain in developing and implementing transcriptomics-based deconvolution approaches, especially those using a single cell/nuclei RNA-seq reference atlas, which are becoming rapidly available across many tissues. Notably, deconvolution algorithms are frequently developed using samples from tissues with similar cell sizes. However, brain tissue or immune cell populations have cell types with substantially different cell sizes, total mRNA expression, and transcriptional activity. When existing deconvolution approaches are applied to these tissues, these systematic differences in cell sizes and transcriptomic activity confound accurate cell proportion estimates and instead may quantify total mRNA content. Furthermore, there is a lack of standard reference atlases and computational approaches to facilitate integrative analyses, including not only bulk and single cell/nuclei RNA-seq data, but also new data modalities from spatial -omic or imaging approaches. New multi-assay datasets need to be collected with orthogonal data types generated from the same tissue block and the same individual, to serve as a "gold standard" for evaluating new and existing deconvolution methods. Below, we discuss these key challenges and how they can be addressed with the acquisition of new datasets and approaches to analysis.
Collapse
Affiliation(s)
- Sean K Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Louise A Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | | | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
| |
Collapse
|
32
|
Covert I, Gala R, Wang T, Svoboda K, Sümbül U, Lee SI. Predictive and robust gene selection for spatial transcriptomics. Nat Commun 2023; 14:2091. [PMID: 37045821 PMCID: PMC10097645 DOI: 10.1038/s41467-023-37392-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 03/16/2023] [Indexed: 04/14/2023] Open
Abstract
A prominent trend in single-cell transcriptomics is providing spatial context alongside a characterization of each cell's molecular state. This typically requires targeting an a priori selection of genes, often covering less than 1% of the genome, and a key question is how to optimally determine the small gene panel. We address this challenge by introducing a flexible deep learning framework, PERSIST, to identify informative gene targets for spatial transcriptomics studies by leveraging reference scRNA-seq data. Using datasets spanning different brain regions, species, and scRNA-seq technologies, we show that PERSIST reliably identifies panels that provide more accurate prediction of the genome-wide expression profile, thereby capturing more information with fewer genes. PERSIST can be adapted to specific biological goals, and we demonstrate that PERSIST's binarization of gene expression levels enables models trained on scRNA-seq data to generalize with to spatial transcriptomics data, despite the complex shift between these technologies.
Collapse
Affiliation(s)
- Ian Covert
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Rohan Gala
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Tim Wang
- HHMI Janelia Research Campus, Ashburn, VA, USA
| | - Karel Svoboda
- HHMI Janelia Research Campus, Ashburn, VA, USA
- Allen Institute for Neural Dynamics, Seattle, WA, USA
| | - Uygar Sümbül
- Allen Institute for Brain Science, Seattle, WA, USA.
| | - Su-In Lee
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
33
|
Zhang Y, Petukhov V, Biederstedt E, Que R, Zhang K, Kharchenko PV. Gene panel selection for targeted spatial transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.03.527053. [PMID: 36993340 PMCID: PMC10054990 DOI: 10.1101/2023.02.03.527053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Targeted spatial transcriptomics hold particular promise in analysis of complex tissues. Most such methods, however, measure only a limited panel of transcripts, which need to be selected in advance to inform on the cell types or processes being studied. A limitation of existing gene selection methods is that they rely on scRNA-seq data, ignoring platform effects between technologies. Here we describe gpsFISH, a computational method to perform gene selection through optimizing detection of known cell types. By modeling and adjusting for platform effects, gpsFISH outperforms other methods. Furthermore, gpsFISH can incorporate cell type hierarchies and custom gene preferences to accommodate diverse design requirements.
Collapse
Affiliation(s)
- Yida Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Neurobiology, Duke University, Durham, NC, USA
| | - Viktor Petukhov
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Evan Biederstedt
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard Que
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Kun Zhang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- San Diego Institute of Science, Altos Labs, San Diego, CA, USA
| | - Peter V. Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- San Diego Institute of Science, Altos Labs, San Diego, CA, USA
| |
Collapse
|
34
|
Baran Y, Doğan B. scMAGS: Marker gene selection from scRNA-seq data for spatial transcriptomics studies. Comput Biol Med 2023; 155:106634. [PMID: 36774895 DOI: 10.1016/j.compbiomed.2023.106634] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 01/28/2023] [Accepted: 02/04/2023] [Indexed: 02/11/2023]
Abstract
Single-Cell RNA sequencing (scRNA-seq) has provided unprecedented opportunities for exploring gene expression and thus uncovering regulatory relationships between genes at the single-cell level. However, scRNA-seq relies on isolating cells from tissues. Therefore, the spatial context of the regulatory processes is lost. A recent technological innovation, spatial transcriptomics, allows for the measurement of gene expression while preserving spatial information. An initial step in the spatial transcriptomic analysis is to identify the cell type, which requires a careful selection of cell-specific marker genes. For this purpose, currently, scRNA-seq data is used to select a limited number of marker genes from among all genes that distinguish cell types from each other. This study proposes scMAGS (single-cell MArker Gene Selection), a novel method for marker gene selection from scRNA-seq data for spatial transcriptomics studies. scMAGS uses a filtering step in which the candidate genes are identified before the marker gene selection step. For the selection of marker genes, cluster validity indices, the Silhouette index, or the Calinski-Harabasz index (for large datasets) are utilized. Experimental results showed that, in comparison to the existing methods, scMAGS is scalable, fast, and accurate. Even for large datasets with millions of cells, scMAGS could find the required number of marker genes in a reasonable amount of time with fewer memory requirements. scMAGS is made freely available at https://github.com/doganlab/scmags and can be downloaded from the Python Package Directory (PyPI) software repository with the command pip install scmags.
Collapse
Affiliation(s)
- Yusuf Baran
- Department of Biomedical Engineering, Inonu University, Malatya, Turkey
| | - Berat Doğan
- Department of Biomedical Engineering, Inonu University, Malatya, Turkey.
| |
Collapse
|
35
|
Hasanaj E, Alavi A, Gupta A, Póczos B, Bar-Joseph Z. Multiset multicover methods for discriminative marker selection. CELL REPORTS METHODS 2022; 2:100332. [PMID: 36452867 PMCID: PMC9701606 DOI: 10.1016/j.crmeth.2022.100332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 08/12/2022] [Accepted: 10/18/2022] [Indexed: 06/17/2023]
Abstract
Markers are increasingly being used for several high-throughput data analysis and experimental design tasks. Examples include the use of markers for assigning cell types in scRNA-seq studies, for deconvolving bulk gene expression data, and for selecting marker proteins in single-cell spatial proteomics studies. Most marker selection methods focus on differential expression (DE) analysis. Although such methods work well for data with a few non-overlapping marker sets, they are not appropriate for large atlas-size datasets where several cell types and tissues are considered. To address this, we define the phenotype cover (PC) problem for marker selection and present algorithms that can improve the discriminative power of marker sets. Analysis of these sets on several marker-selection tasks suggests that these methods can lead to solutions that accurately distinguish different phenotypes in the data.
Collapse
Affiliation(s)
- Euxhen Hasanaj
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Amir Alavi
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Anupam Gupta
- Computer Science Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Barnabás Póczos
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
36
|
Niles-Weed J, Rigollet P. Estimation of Wasserstein distances in the Spiked Transport Model. BERNOULLI 2022. [DOI: 10.3150/21-bej1433] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- Jonathan Niles-Weed
- Courant Institute of Mathematical Sciences & Center for Data Science, New York University, 251 Mercer Street, New York, NY 10012-1185, USA
| | - Philippe Rigollet
- Department of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139-4307, USA
| |
Collapse
|
37
|
Nelson ME, Riva SG, Cvejic A. SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing. BMC Bioinformatics 2022; 23:328. [PMID: 35941549 PMCID: PMC9361618 DOI: 10.1186/s12859-022-04860-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 07/12/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Single-cell RNA-sequencing is revolutionising the study of cellular and tissue-wide heterogeneity in a large number of biological scenarios, from highly tissue-specific studies of disease to human-wide cell atlases. A central task in single-cell RNA-sequencing analysis design is the calculation of cell type-specific genes in order to study the differential impact of different replicates (e.g. tumour vs. non-tumour environment) on the regulation of those genes and their associated networks. The crucial task is the efficient and reliable calculation of such cell type-specific 'marker' genes. These optimise the ability of the experiment to isolate highly-specific cell phenotypes of interest to the analyser. However, while methods exist that can calculate marker genes from single-cell RNA-sequencing, no such method places emphasise on specific cell phenotypes for downstream study in e.g. differential gene expression or other experimental protocols (spatial transcriptomics protocols for example). Here we present SMaSH, a general computational framework for extracting key marker genes from single-cell RNA-sequencing data which reliably characterise highly-specific and niche populations of cells in numerous different biological data-sets. RESULTS SMaSH extracts robust and biologically well-motivated marker genes, which characterise a given single-cell RNA-sequencing data-set better than existing computational approaches for general marker gene calculation. We demonstrate the utility of SMaSH through its substantial performance improvement over several existing methods in the field. Furthermore, we evaluate the SMaSH markers on spatial transcriptomics data, demonstrating they identify highly localised compartments of the mouse cortex. CONCLUSION SMaSH is a new methodology for calculating robust markers genes from large single-cell RNA-sequencing data-sets, and has implications for e.g. effective gene identification for probe design in downstream analyses spatial transcriptomics experiments. SMaSH has been fully-integrated with the ScanPy framework and provides a valuable bioinformatics tool for cell type characterisation and validation in every-growing data-sets spanning over 50 different cell types across hundreds of thousands of cells.
Collapse
Affiliation(s)
- M E Nelson
- European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, CB10 1SD, UK. .,Department of Haematology, University of Cambridge, Cambridge, CB2 0AW, UK. .,Wellcome - Medical Research Council Cambridge Stem Cell Institute, Cambridge, CB2 0AW, UK. .,Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.
| | - S G Riva
- Department of Haematology, University of Cambridge, Cambridge, CB2 0AW, UK.,Wellcome - Medical Research Council Cambridge Stem Cell Institute, Cambridge, CB2 0AW, UK.,Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1RQ, UK
| | - A Cvejic
- Department of Haematology, University of Cambridge, Cambridge, CB2 0AW, UK. .,Wellcome - Medical Research Council Cambridge Stem Cell Institute, Cambridge, CB2 0AW, UK. .,Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1RQ, UK.
| |
Collapse
|
38
|
Garg T, Weiss CR, Sheth RA. Techniques for Profiling the Cellular Immune Response and Their Implications for Interventional Oncology. Cancers (Basel) 2022; 14:3628. [PMID: 35892890 PMCID: PMC9332307 DOI: 10.3390/cancers14153628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 12/07/2022] Open
Abstract
In recent years there has been increased interest in using the immune contexture of the primary tumors to predict the patient's prognosis. The tumor microenvironment of patients with cancers consists of different types of lymphocytes, tumor-infiltrating leukocytes, dendritic cells, and others. Different technologies can be used for the evaluation of the tumor microenvironment, all of which require a tissue or cell sample. Image-guided tissue sampling is a cornerstone in the diagnosis, stratification, and longitudinal evaluation of therapeutic efficacy for cancer patients receiving immunotherapies. Therefore, interventional radiologists (IRs) play an essential role in the evaluation of patients treated with systemically administered immunotherapies. This review provides a detailed description of different technologies used for immune assessment and analysis of the data collected from the use of these technologies. The detailed approach provided herein is intended to provide the reader with the knowledge necessary to not only interpret studies containing such data but also design and apply these tools for clinical practice and future research studies.
Collapse
Affiliation(s)
- Tushar Garg
- Division of Vascular and Interventional Radiology, Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (T.G.); (C.R.W.)
| | - Clifford R. Weiss
- Division of Vascular and Interventional Radiology, Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (T.G.); (C.R.W.)
| | - Rahul A. Sheth
- Department of Interventional Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
39
|
Mauracher AA, Henrickson SE. Leveraging Systems Immunology to Optimize Diagnosis and Treatment of Inborn Errors of Immunity. FRONTIERS IN SYSTEMS BIOLOGY 2022; 2:910243. [PMID: 37670772 PMCID: PMC10477056 DOI: 10.3389/fsysb.2022.910243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Inborn errors of immunity (IEI) are monogenic disorders that can cause diverse symptoms, including recurrent infections, autoimmunity and malignancy. While many factors have contributed, the increased availability of next-generation sequencing has been central in the remarkable increase in identification of novel monogenic IEI over the past years. Throughout this phase of disease discovery, it has also become evident that a given gene variant does not always yield a consistent phenotype, while variants in seemingly disparate genes can lead to similar clinical presentations. Thus, it is increasingly clear that the clinical phenotype of an IEI patient is not defined by genetics alone, but is also impacted by a myriad of factors. Accordingly, we need methods to amplify our current diagnostic algorithms to better understand mechanisms underlying the variability in our patients and to optimize treatment. In this review, we will explore how systems immunology can contribute to optimizing both diagnosis and treatment of IEI patients by focusing on identifying and quantifying key dysregulated pathways. To improve mechanistic understanding in IEI we must deeply evaluate our rare IEI patients using multimodal strategies, allowing both the quantification of altered immune cell subsets and their functional evaluation. By studying representative controls and patients, we can identify causative pathways underlying immune cell dysfunction and move towards functional diagnosis. Attaining this deeper understanding of IEI will require a stepwise strategy. First, we need to broadly apply these methods to IEI patients to identify patterns of dysfunction. Next, using multimodal data analysis, we can identify key dysregulated pathways. Then, we must develop a core group of simple, effective functional tests that target those pathways to increase efficiency of initial diagnostic investigations, provide evidence for therapeutic selection and contribute to the mechanistic evaluation of genetic results. This core group of simple, effective functional tests, targeting key pathways, can then be equitably provided to our rare patients. Systems biology is thus poised to reframe IEI diagnosis and therapy, fostering research today that will provide streamlined diagnosis and treatment choices for our rare and complex patients in the future, as well as providing a better understanding of basic immunology.
Collapse
Affiliation(s)
- Andrea A. Mauracher
- Division of Allergy and Immunology, Department of Pediatrics, Children’s Hospital of Philadelphia, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Sarah E. Henrickson
- Division of Allergy and Immunology, Department of Pediatrics, Children’s Hospital of Philadelphia, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
40
|
Murphy M, Jegelka S, Fraenkel E. Self-supervised learning of cell type specificity from immunohistochemical images. Bioinformatics 2022; 38:i395-i403. [PMID: 35758799 PMCID: PMC9235491 DOI: 10.1093/bioinformatics/btac263] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Advances in bioimaging now permit in situ proteomic characterization of cell-cell interactions in complex tissues, with important applications across a spectrum of biological problems from development to disease. These methods depend on selection of antibodies targeting proteins that are expressed specifically in particular cell types. Candidate marker proteins are often identified from single-cell transcriptomic data, with variable rates of success, in part due to divergence between expression levels of proteins and the genes that encode them. In principle, marker identification could be improved by using existing databases of immunohistochemistry for thousands of antibodies in human tissue, such as the Human Protein Atlas. However, these data lack detailed annotations of the types of cells in each image. RESULTS We develop a method to predict cell type specificity of protein markers from unlabeled images. We train a convolutional neural network with a self-supervised objective to generate embeddings of the images. Using non-linear dimensionality reduction, we observe that the model clusters images according to cell types and anatomical regions for which the stained proteins are specific. We then use estimates of cell type specificity derived from an independent single-cell transcriptomics dataset to train an image classifier, without requiring any human labelling of images. Our scheme demonstrates superior classification of known proteomic markers in kidney compared to selection via single-cell transcriptomics. AVAILABILITY AND IMPLEMENTATION Code and trained model are available at www.github.com/murphy17/HPA-SimCLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael Murphy
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Stefanie Jegelka
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ernest Fraenkel
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
41
|
Lall S, Ray S, Bandyopadhyay S. A copula based topology preserving graph convolution network for clustering of single-cell RNA-seq data. PLoS Comput Biol 2022; 18:e1009600. [PMID: 35271564 PMCID: PMC8979455 DOI: 10.1371/journal.pcbi.1009600] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 04/04/2022] [Accepted: 01/27/2022] [Indexed: 11/18/2022] Open
Abstract
Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering. Here we introduce sc-CGconv (copula based graph convolution network for single clustering), a stepwise robust unsupervised feature extraction and clustering approach that formulates and aggregates cell–cell relationships using copula correlation (Ccor), followed by a graph convolution network based clustering approach. sc-CGconv formulates a cell-cell graph using Ccor that is learned by a graph-based artificial intelligence model, graph convolution network. The learned representation (low dimensional embedding) is utilized for cell clustering. sc-CGconv features the following advantages. a. sc-CGconv works with substantially smaller sample sizes to identify homogeneous clusters. b. sc-CGconv can model the expression co-variability of a large number of genes, thereby outperforming state-of-the-art gene selection/extraction methods for clustering. c. sc-CGconv preserves the cell-to-cell variability within the selected gene set by constructing a cell-cell graph through copula correlation measure. d. sc-CGconv provides a topology-preserving embedding of cells in low dimensional space. One of the important aspects of single cell downstream analysis is to classify cells into subpopulations. This immediately leads to clustering of cells into homogeneous groups, which faces lots of issues due to (i) small amount of starting RNA, (ii) cell-to-cell variability, (iii) technical noise incorporated within the single cell sequencing technology, and (iv) unavailability of discriminating selected/extracted genes (features) in the preprocessing step of downstream analysis. We proposed sc-CGconv, stepwise feature extraction and clustering framework, which leverage landmark advantage of copula and graph convolution network in single-cell analysis domain. sc-CGconv outperforms the state-of-the-art feature selection/extraction methods in the preprocessing steps, performs well with small sample size data, can preserve the cell-to-cell variability within the extracted features, provides a topology-preserving embedding of cells in low dimensional space. sc-CGconv therefore successfully addresses the above-mentioned key challenges.
Collapse
Affiliation(s)
- Snehalika Lall
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| | - Sumanta Ray
- Department of Computer Science and Engineering, Aliah University, Kolkata, India
- Health Analytics Network, Pittsburgh, Pennsylvania, United States of America
- * E-mail: , (SR); (SB)
| | | |
Collapse
|
42
|
Ding J, Sharon N, Bar-Joseph Z. Temporal modelling using single-cell transcriptomics. Nat Rev Genet 2022; 23:355-368. [PMID: 35102309 DOI: 10.1038/s41576-021-00444-7] [Citation(s) in RCA: 88] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2021] [Indexed: 12/16/2022]
Abstract
Methods for profiling genes at the single-cell level have revolutionized our ability to study several biological processes and systems including development, differentiation, response programmes and disease progression. In many of these studies, cells are profiled over time in order to infer dynamic changes in cell states and types, sets of expressed genes, active pathways and key regulators. However, time-series single-cell RNA sequencing (scRNA-seq) also raises several new analysis and modelling issues. These issues range from determining when and how deep to profile cells, linking cells within and between time points, learning continuous trajectories, and integrating bulk and single-cell data for reconstructing models of dynamic networks. In this Review, we discuss several approaches for the analysis and modelling of time-series scRNA-seq, highlighting their steps, key assumptions, and the types of data and biological questions they are most appropriate for.
Collapse
|
43
|
Liu B, Li Y, Zhang L. Analysis and Visualization of Spatial Transcriptomic Data. Front Genet 2022; 12:785290. [PMID: 35154244 PMCID: PMC8829434 DOI: 10.3389/fgene.2021.785290] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 12/24/2021] [Indexed: 12/21/2022] Open
Abstract
Human and animal tissues consist of heterogeneous cell types that organize and interact in highly structured manners. Bulk and single-cell sequencing technologies remove cells from their original microenvironments, resulting in a loss of spatial information. Spatial transcriptomics is a recent technological innovation that measures transcriptomic information while preserving spatial information. Spatial transcriptomic data can be generated in several ways. RNA molecules are measured by in situ sequencing, in situ hybridization, or spatial barcoding to recover original spatial coordinates. The inclusion of spatial information expands the range of possibilities for analysis and visualization, and spurred the development of numerous novel methods. In this review, we summarize the core concepts of spatial genomics technology and provide a comprehensive review of current analysis and visualization methods for spatial transcriptomics.
Collapse
|
44
|
Grisanti Canozo FJ, Zuo Z, Martin JF, Samee MAH. Cell-type modeling in spatial transcriptomics data elucidates spatially variable colocalization and communication between cell-types in mouse brain. Cell Syst 2022; 13:58-70.e5. [PMID: 34626538 PMCID: PMC8776574 DOI: 10.1016/j.cels.2021.09.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 08/06/2021] [Accepted: 09/10/2021] [Indexed: 01/21/2023]
Abstract
Single-cell spatial transcriptomics (sc-ST) holds the promise to elucidate architectural aspects of complex tissues. Such analyses require modeling cell types in sc-ST datasets through their integration with single-cell RNA-seq datasets. However, this integration, is nontrivial since the two technologies differ widely in the number of profiled genes, and the datasets often do not share many marker genes for given cell types. We developed a neural network model, spatial transcriptomics cell-types assignment using neural networks (STANN), to overcome these challenges. Analysis of STANN's predicted cell types in mouse olfactory bulb (MOB) sc-ST data delineated MOB architecture beyond its morphological layer-based conventional description. We find that cell-type proportions remain consistent within individual morphological layers but vary significantly between layers. Notably, even within a layer, cellular colocalization patterns and intercellular communication mechanisms show high spatial variations. These observations imply a refinement of major cell types into subtypes characterized by spatially localized gene regulatory networks and receptor-ligand usage.
Collapse
Affiliation(s)
| | - Zhen Zuo
- Baylor College of Medicine, Houston, TX 77030, USA
| | - James F Martin
- Baylor College of Medicine, Houston, TX 77030, USA; Texas Heart Institute, Houston, TX 77030, USA
| | | |
Collapse
|
45
|
Wang A, Liu H, Yang J, Chen G. Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data. Comput Biol Med 2022; 142:105208. [PMID: 35016102 DOI: 10.1016/j.compbiomed.2021.105208] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 12/19/2021] [Accepted: 12/31/2021] [Indexed: 01/31/2023]
Abstract
Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors.
Collapse
Affiliation(s)
- Aiguo Wang
- School of Electronic Information Engineering, Foshan University, Foshan, China.
| | - Huancheng Liu
- School of Electronic Information Engineering, Foshan University, Foshan, China.
| | - Jing Yang
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China.
| | - Guilin Chen
- School of Computer and Information Engineering, Chuzhou University, Chuzhou, China.
| |
Collapse
|
46
|
Missarova A, Jain J, Butler A, Ghazanfar S, Stuart T, Brusko M, Wasserfall C, Nick H, Brusko T, Atkinson M, Satija R, Marioni JC. geneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq. Genome Biol 2021; 22:333. [PMID: 34872616 PMCID: PMC8650258 DOI: 10.1186/s13059-021-02548-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/19/2021] [Indexed: 12/13/2022] Open
Abstract
scRNA-seq datasets are increasingly used to identify gene panels that can be probed using alternative technologies, such as spatial transcriptomics, where choosing the best subset of genes is vital. Existing methods are limited by a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cells. We introduce an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. Our approach outperforms existing strategies and can resolve cell types and subtle cell state differences.
Collapse
Affiliation(s)
- Alsu Missarova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | | | - Andrew Butler
- New York Genome Center, New York, USA
- Center for Genomics and Systems Biology, NYU, New York, USA
| | - Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Tim Stuart
- New York Genome Center, New York, USA
- Center for Genomics and Systems Biology, NYU, New York, USA
| | - Maigan Brusko
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Clive Wasserfall
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Harry Nick
- Department of Neuroscience, College of Medicine, University of Florida, Jacksonville, USA
| | - Todd Brusko
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Mark Atkinson
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Jacksonville, USA
| | - Rahul Satija
- New York Genome Center, New York, USA.
- Center for Genomics and Systems Biology, NYU, New York, USA.
| | - John C Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
47
|
Yang P, Huang H, Liu C. Feature selection revisited in the single-cell era. Genome Biol 2021; 22:321. [PMID: 34847932 PMCID: PMC8638336 DOI: 10.1186/s13059-021-02544-3] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/15/2021] [Indexed: 12/13/2022] Open
Abstract
Recent advances in single-cell biotechnologies have resulted in high-dimensional datasets with increased complexity, making feature selection an essential technique for single-cell data analysis. Here, we revisit feature selection techniques and summarise recent developments. We review their application to a range of single-cell data types generated from traditional cytometry and imaging technologies and the latest array of single-cell omics technologies. We highlight some of the challenges and future directions and finally consider their scalability and make general recommendations on each type of feature selection method. We hope this review stimulates future research and application of feature selection in the single-cell era.
Collapse
Affiliation(s)
- Pengyi Yang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia.
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia.
- Charles Perkins Centre, University of Sydney, Sydney, NSW, 2006, Australia.
| | - Hao Huang
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW, 2006, Australia
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
| |
Collapse
|
48
|
Fischer S, Gillis J. How many markers are needed to robustly determine a cell's type? iScience 2021; 24:103292. [PMID: 34765918 PMCID: PMC8571500 DOI: 10.1016/j.isci.2021.103292] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 09/16/2021] [Accepted: 10/13/2021] [Indexed: 12/30/2022] Open
Abstract
Our understanding of cell types has advanced considerably with the publication of single-cell atlases. Marker genes play an essential role for experimental validation and computational analyses such as physiological characterization, annotation, and deconvolution. However, a framework for quantifying marker replicability and selecting replicable markers is currently lacking. Here, using high-quality data from the Brain Initiative Cell Census Network (BICCN), we systematically investigate marker replicability for 85 neuronal cell types. We show that, due to dataset-specific noise, we need to combine 5 datasets to obtain robust differentially expressed (DE) genes, particularly for rare populations and lowly expressed genes. We estimate that 10 to 200 meta-analytic markers provide optimal downstream performance and make available replicable marker lists for the 85 BICCN cell types. Replicable marker lists condense interpretable and generalizable information about cell types, opening avenues for downstream applications, including cell type annotation, selection of gene panels, and bulk data deconvolution.
Collapse
Affiliation(s)
- Stephan Fischer
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY 11724, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY 11724, USA
- Cold Spring Harbor Laboratory, Watson School of Biological Sciences, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|