1
|
Dong M, Su DG, Kluger H, Fan R, Kluger Y. SIMVI disentangles intrinsic and spatial-induced cellular states in spatial omics data. Nat Commun 2025; 16:2990. [PMID: 40148341 PMCID: PMC11950362 DOI: 10.1038/s41467-025-58089-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Accepted: 03/05/2025] [Indexed: 03/29/2025] Open
Abstract
Spatial omics technologies enable analysis of gene expression and interaction dynamics in relation to tissue structure and function. However, existing computational methods may not properly distinguish cellular intrinsic variability and intercellular interactions, and may thus fail to reliably capture spatial regulations. Here, we present Spatial Interaction Modeling using Variational Inference (SIMVI), an annotation-free deep learning framework that disentangles cell intrinsic and spatial-induced latent variables in spatial omics data with rigorous theoretical support. By this disentanglement, SIMVI enables estimation of spatial effects at a single-cell resolution, and empowers various downstream analyses. We demonstrate the superior performance of SIMVI across datasets from diverse platforms and tissues. SIMVI illuminates the cyclical spatial dynamics of germinal center B cells in human tonsil. Applying SIMVI to multiome melanoma data reveals potential tumor epigenetic reprogramming states. On our newly-collected cohort-level CosMx melanoma data, SIMVI uncovers space-and-outcome-dependent macrophage states and cellular communication machinery in tumor microenvironments.
Collapse
Affiliation(s)
- Mingze Dong
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - David G Su
- Department of Medicine, Yale School of Medicine, New Haven, CT, USA
- Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
- Yale Center for Immuno-Oncology, Yale School of Medicine, New Haven, CT, USA
- Department of Surgery, Yale School of Medicine, New Haven, CT, USA
| | - Harriet Kluger
- Department of Medicine, Yale School of Medicine, New Haven, CT, USA
- Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
- Yale Center for Immuno-Oncology, Yale School of Medicine, New Haven, CT, USA
| | - Rong Fan
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
| | - Yuval Kluger
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA.
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA.
- Applied Mathematics Program, Yale University, New Haven, CT, USA.
| |
Collapse
|
2
|
Pavel A, Grønberg MG, Clemmensen LH. The impact of dropouts in scRNAseq dense neighborhood analysis. Comput Struct Biotechnol J 2025; 27:1278-1285. [PMID: 40225837 PMCID: PMC11992407 DOI: 10.1016/j.csbj.2025.03.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 03/19/2025] [Accepted: 03/20/2025] [Indexed: 04/15/2025] Open
Abstract
Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and preprocessing pipelines are based on the assumption that poor data can be counteracted by quantity and that similar cells (samples) are close to each other in space. Clustering is commonly used to detect clusters (dense local cell neighborhoods) under the assumption that similar cells are close to each other in space (where close is dependent on the (distance) metric used). The most commonly used clustering methodologies to detect dense local neighborhoods are based on graph clustering on a nearest neighbor graph. However, high dropout rates may break this assumption and make it difficult to reliably detect such dense local neighborhoods. We assess the cluster homogeneity and stability under increasing degrees of dropouts in one of the most popular clustering pipelines (dimensionality reduction + graph based clustering), as provided by scRNAseq analyses packages Seurat and Scanpy. Our study showcases that while the default pipeline performs well in terms of cluster homogeneity (i.e., cells in a cluster are of the same type), also with increasing dropout rates, the stability of clusters (i.e., cell pairs consistently being in the same cluster) decreases. This implies that sub-populations within cell types are increasingly difficult to identify under increasing dropout rates because observations are not consistently close. Our results challenge the current practice of using default clustering pipelines and the general assumption of identifiable local neighborhoods on high dropout data. Hence, these results suggest that careful consideration in interpretation and downstream analysis need to be made when relying on local neighborhoods and clusters on scRNAseq data. In addition, these results call for extensive benchmarking, to identify and provide methods robust in their local neighborhood relationships on data containing low to high dropout rates.
Collapse
Affiliation(s)
- Alisa Pavel
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - Manja Gersholm Grønberg
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - Line H. Clemmensen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
- Department of Mathematical Sciences, University of Copenhagen, 2100, Copenhagen, Denmark
| |
Collapse
|
3
|
Millard N, Chen JH, Palshikar MG, Pelka K, Spurrell M, Price C, He J, Hacohen N, Raychaudhuri S, Korsunsky I. Batch correcting single-cell spatial transcriptomics count data with Crescendo improves visualization and detection of spatial gene patterns. Genome Biol 2025; 26:36. [PMID: 40001084 PMCID: PMC11863647 DOI: 10.1186/s13059-025-03479-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 01/21/2025] [Indexed: 02/27/2025] Open
Abstract
Spatial transcriptomics facilitates gene expression analysis of cells in their spatial anatomical context. Batch effects hinder visualization of gene spatial patterns across samples. We present the Crescendo algorithm to correct for batch effects at the gene expression level and enable accurate visualization of gene expression patterns across multiple samples. We show Crescendo's utility and scalability across three datasets ranging from 170,000 to 7 million single cells across spatial and single-cell RNA sequencing technologies. By correcting for batch effects, Crescendo enhances spatial transcriptomics analyses to detect gene colocalization and ligand-receptor interactions and enables cross-technology information transfer.
Collapse
Affiliation(s)
- Nghia Millard
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jonathan H Chen
- Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Massachusetts General Hospital (MGH) Cancer Center, Harvard Medical School, Boston, MA, USA
- Department of Pathology, MGH, Boston, MA, USA
| | - Mukta G Palshikar
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Karin Pelka
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Massachusetts General Hospital (MGH) Cancer Center, Harvard Medical School, Boston, MA, USA
- UCSF Institute of Genomic Immunology, Gladstone Institutes, San Francisco, CA, USA
| | - Maxwell Spurrell
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Massachusetts General Hospital (MGH) Cancer Center, Harvard Medical School, Boston, MA, USA
- Department of Pathology, MGH, Boston, MA, USA
| | | | | | - Nir Hacohen
- Department of Immunology, Harvard Medical School, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Massachusetts General Hospital (MGH) Cancer Center, Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Boston, MA, USA.
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Ilya Korsunsky
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.
- Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Tu JJ, Yan H, Zhang XF, Lin Z. Precise gene expression deconvolution in spatial transcriptomics with STged. Nucleic Acids Res 2025; 53:gkaf087. [PMID: 39970279 PMCID: PMC11838043 DOI: 10.1093/nar/gkaf087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 01/07/2025] [Accepted: 02/02/2025] [Indexed: 02/21/2025] Open
Abstract
Spatially resolved transcriptomics (SRT) has transformed tissue biology by linking gene expression profiles with spatial information. However, sequencing-based SRT methods aggregate signals from multiple cell types within capture locations ("spots"), masking cell-type-specific gene expression patterns. Traditional cell-type deconvolution methods estimate cell compositions within spots but fail to resolve cell-type-specific gene expression, limiting their ability to uncover critical biological processes such as cellular interactions and microenvironmental dynamics. Here, we present STged (spatial transcriptomic gene expression deconvolution), a novel computational framework that goes beyond traditional deconvolution by reconstructing cell-type-specific gene expression profiles from mixed spots. STged integrates graph-based spatial correlations and reference-derived gene signatures using a non-negative least-squares regression framework, achieving precise and biologically meaningful deconvolution. Comprehensive simulations show that STged consistently outperforms existing methods in accuracy and robustness. Applications to human pancreatic ductal adenocarcinoma and human squamous cell carcinoma datasets reveal its capacity to identify microenvironment-specific highly variable genes, reconstruct spatial cell-cell communication networks, and resolve tissue architecture at near-single-cell resolution. In mouse kidney tissues, STged uncovers dynamic spatial gene expression patterns and distinct gene programs, advancing our understanding of tissue heterogeneity and cellular dynamics.
Collapse
Affiliation(s)
- Jia-Juan Tu
- School of Science, Hubei University of Technology, Wuhan 430079, China
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong 999077, China
| | - Hong Yan
- Centre for Intelligent Multidimensional Data Analysis, Hong Kong 999077, China
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics, and Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan 430079, China
- Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan 430079, China
| | - Zhixiang Lin
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong 999077, China
| |
Collapse
|
5
|
Shang L, Wu P, Zhou X. Statistical identification of cell type-specific spatially variable genes in spatial transcriptomics. Nat Commun 2025; 16:1059. [PMID: 39865128 PMCID: PMC11770176 DOI: 10.1038/s41467-025-56280-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 01/06/2025] [Indexed: 01/28/2025] Open
Abstract
An essential task in spatial transcriptomics is identifying spatially variable genes (SVGs). Here, we present Celina, a statistical method for systematically detecting cell type-specific SVGs (ct-SVGs)-a subset of SVGs exhibiting distinct spatial expression patterns within specific cell types. Celina utilizes a spatially varying coefficient model to accurately capture each gene's spatial expression pattern in relation to the distribution of cell types across tissue locations, ensuring effective type I error control and high power. Celina proves powerful compared to existing methods in single-cell resolution spatial transcriptomics and stands as the only effective solution for spot-resolution spatial transcriptomics. Applied to five real datasets, Celina uncovers ct-SVGs associated with tumor progression and patient survival in lung cancer, identifies metagenes with unique spatial patterns linked to cell proliferation and immune response in kidney cancer, and detects genes preferentially expressed near amyloid-β plaques in an Alzheimer's model.
Collapse
Affiliation(s)
- Lulu Shang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peijun Wu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
6
|
Sarkar H, Lee E, Lopez-Darwin SL, Kang Y. Deciphering normal and cancer stem cell niches by spatial transcriptomics: opportunities and challenges. Genes Dev 2025; 39:64-85. [PMID: 39496456 PMCID: PMC11789490 DOI: 10.1101/gad.351956.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2024]
Abstract
Cancer stem cells (CSCs) often exhibit stem-like attributes that depend on an intricate stemness-promoting cellular ecosystem within their niche. The interplay between CSCs and their niche has been implicated in tumor heterogeneity and therapeutic resistance. Normal stem cells (NSCs) and CSCs share stemness features and common microenvironmental components, displaying significant phenotypic and functional plasticity. Investigating these properties across diverse organs during normal development and tumorigenesis is of paramount research interest and translational potential. Advancements in next-generation sequencing (NGS), single-cell transcriptomics, and spatial transcriptomics have ushered in a new era in cancer research, providing high-resolution and comprehensive molecular maps of diseased tissues. Various spatial technologies, with their unique ability to measure the location and molecular profile of a cell within tissue, have enabled studies on intratumoral architecture and cellular cross-talk within the specific niches. Moreover, delineation of spatial patterns for niche-specific properties such as hypoxia, glucose deprivation, and other microenvironmental remodeling are revealed through multilevel spatial sequencing. This tremendous progress in technology has also been paired with the advent of computational tools to mitigate technology-specific bottlenecks. Here we discuss how different spatial technologies are used to identify NSCs and CSCs, as well as their associated niches. Additionally, by exploring related public data sets, we review the current challenges in characterizing such niches, which are often hindered by technological limitations, and the computational solutions used to address them.
Collapse
Affiliation(s)
- Hirak Sarkar
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA
- Ludwig Institute for Cancer Research Princeton Branch, Princeton, New Jersey 08544, USA
- Department of Computer Science, Princeton, New Jersey 08544, USA
| | - Eunmi Lee
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA
| | - Sereno L Lopez-Darwin
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Yibin Kang
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA;
- Ludwig Institute for Cancer Research Princeton Branch, Princeton, New Jersey 08544, USA
- Cancer Metabolism and Growth Program, Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey 08903, USA
| |
Collapse
|
7
|
Das Adhikari S, Yang J, Wang J, Cui Y. Recent advances in spatially variable gene detection in spatial transcriptomics. Comput Struct Biotechnol J 2024; 23:883-891. [PMID: 38370977 PMCID: PMC10869304 DOI: 10.1016/j.csbj.2024.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/22/2024] [Accepted: 01/22/2024] [Indexed: 02/20/2024] Open
Abstract
With the emergence of advanced spatial transcriptomic technologies, there has been a surge in research papers dedicated to analyzing spatial transcriptomics data, resulting in significant contributions to our understanding of biology. The initial stage of downstream analysis of spatial transcriptomic data has centered on identifying spatially variable genes (SVGs) or genes expressed with specific spatial patterns across the tissue. SVG detection is an important task since many downstream analyses depend on these selected SVGs. Over the past few years, a plethora of new methods have been proposed for the detection of SVGs, accompanied by numerous innovative concepts and discussions. This article provides a selective review of methods and their practical implementations, offering valuable insights into the current literature in this field.
Collapse
Affiliation(s)
- Sikta Das Adhikari
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| | - Jiaxin Yang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
8
|
Li H, Zhu B, Jiang X, Guo L, Xie Y, Xu L, Li Q. An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data. Biometrics 2024; 80:ujae066. [PMID: 39073775 DOI: 10.1093/biomtc/ujae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 05/13/2024] [Accepted: 07/07/2024] [Indexed: 07/30/2024]
Abstract
Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.
Collapse
Affiliation(s)
- Huimin Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, United States
| | - Bencong Zhu
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, United States
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Xi Jiang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX 75205, United States
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States
| | - Lei Guo
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States
| | - Yang Xie
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States
| | - Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, United States
| |
Collapse
|
9
|
Lin S, Cui Y, Zhao F, Yang Z, Song J, Yao J, Zhao Y, Qian BZ, Zhao Y, Yuan Z. Complete spatially resolved gene expression is not necessary for identifying spatial domains. CELL GENOMICS 2024; 4:100565. [PMID: 38781966 PMCID: PMC11228956 DOI: 10.1016/j.xgen.2024.100565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/29/2024] [Accepted: 04/30/2024] [Indexed: 05/25/2024]
Abstract
Spatially resolved transcriptomics (SRT) technologies have revolutionized the study of tissue organization. We introduce a graph convolutional network with an attention and positive emphasis mechanism, termed BINARY, relying exclusively on binarized SRT data to accurately delineate spatial domains. BINARY outperforms existing methods across various SRT data types while using significantly less input information. Our study suggests that precise gene expression quantification may not always be essential, inspiring further exploration of the broader applications of spatially resolved binarized gene expression data.
Collapse
Affiliation(s)
- Senlin Lin
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yan Cui
- Institute of Science and Technology for Brain-Inspired Intelligence, MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China; Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China
| | - Fangyuan Zhao
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Zhidong Yang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
| | | | - Yu Zhao
- AI Lab, Tencent, Shenzhen, China
| | - Bin-Zhi Qian
- Fudan University Shanghai Cancer Center, Department of Oncology, Shanghai Medical College, The Human Phenome Institute, Zhangjiang-Fudan International Innovation Center, Fudan University, Shanghai, China
| | - Yi Zhao
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.
| | - Zhiyuan Yuan
- Institute of Science and Technology for Brain-Inspired Intelligence, MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China; Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China.
| |
Collapse
|
10
|
Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024; 25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Collapse
Affiliation(s)
- Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
| | - Yang Lan
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
| | - Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
| | - Yingxue Xiao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Jing Sun
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Lei Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Mingwei Liu
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| |
Collapse
|
11
|
Li X, Qiu P. Gene representation bias in spatial transcriptomics. J Bioinform Comput Biol 2024; 22:2450007. [PMID: 39036848 DOI: 10.1142/s0219720024500070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
For sequencing-based spatial transcriptomics data, the gene-spot count matrix is highly sparse. This feature is similar to scRNA-seq. The goal of this paper is to identify whether there exist genes that are frequently under-detected in Visium compared to bulk RNA-seq, and the underlying potential mechanism of under-detection in Visium. We collected paired Visium and bulk RNA-seq data for 28 human samples and 19 mouse samples, which covered diverse tissue sources. We compared the two data types and observed that there indeed exists a collection of genes frequently under-detected in Visium compared to bulk RNA-seq. We performed a motif search to examine the last 350 bp of the frequently under-detected genes, and we observed that the poly (T) motif was significantly enriched in genes identified from both human and mouse data, which matches with our previous finding about frequently under-detected genes in scRNA-seq. We hypothesized that the poly (T) motif may be able to form a hairpin structure with the poly (A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during Visium library preparation.
Collapse
Affiliation(s)
- Xinling Li
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States
| | - Peng Qiu
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States
| |
Collapse
|
12
|
Laubscher E, Wang X, Razin N, Dougherty T, Xu RJ, Ombelets L, Pao E, Graf W, Moffitt JR, Yue Y, Van Valen D. Accurate single-molecule spot detection for image-based spatial transcriptomics with weakly supervised deep learning. Cell Syst 2024; 15:475-482.e6. [PMID: 38754367 PMCID: PMC11995858 DOI: 10.1016/j.cels.2024.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/05/2024] [Accepted: 04/18/2024] [Indexed: 05/18/2024]
Abstract
Image-based spatial transcriptomics methods enable transcriptome-scale gene expression measurements with spatial information but require complex, manually tuned analysis pipelines. We present Polaris, an analysis pipeline for image-based spatial transcriptomics that combines deep-learning models for cell segmentation and spot detection with a probabilistic gene decoder to quantify single-cell gene expression accurately. Polaris offers a unifying, turnkey solution for analyzing spatial transcriptomics data from multiplexed error-robust FISH (MERFISH), sequential fluorescence in situ hybridization (seqFISH), or in situ RNA sequencing (ISS) experiments. Polaris is available through the DeepCell software library (https://github.com/vanvalenlab/deepcell-spots) and https://www.deepcell.org.
Collapse
Affiliation(s)
- Emily Laubscher
- Division of Chemistry and Chemical Engineering, Caltech, Pasadena, CA 91125, USA
| | - Xuefei Wang
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA 91125, USA
| | - Nitzan Razin
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA 91125, USA
| | - Tom Dougherty
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA 91125, USA
| | - Rosalind J Xu
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115, USA; Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02115, USA
| | - Lincoln Ombelets
- Division of Chemistry and Chemical Engineering, Caltech, Pasadena, CA 91125, USA
| | - Edward Pao
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA 91125, USA
| | - William Graf
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA 91125, USA
| | - Jeffrey R Moffitt
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115, USA; Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yisong Yue
- Division of Computational and Mathematical Sciences, Caltech, Pasadena, CA 91125, USA
| | - David Van Valen
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA 91125, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA.
| |
Collapse
|
13
|
Ospina OE, Soupir AC, Manjarres-Betancur R, Gonzalez-Calderon G, Yu X, Fridley BL. Differential gene expression analysis of spatial transcriptomic experiments using spatial mixed models. Sci Rep 2024; 14:10967. [PMID: 38744956 PMCID: PMC11094014 DOI: 10.1038/s41598-024-61758-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 05/09/2024] [Indexed: 05/16/2024] Open
Abstract
Spatial transcriptomics (ST) assays represent a revolution in how the architecture of tissues is studied by allowing for the exploration of cells in their spatial context. A common element in the analysis is delineating tissue domains or "niches" followed by detecting differentially expressed genes to infer the biological identity of the tissue domains or cell types. However, many studies approach differential expression analysis by using statistical approaches often applied in the analysis of non-spatial scRNA data (e.g., two-sample t-tests, Wilcoxon's rank sum test), hence neglecting the spatial dependency observed in ST data. In this study, we show that applying linear mixed models with spatial correlation structures using spatial random effects effectively accounts for the spatial autocorrelation and reduces inflation of type-I error rate observed in non-spatial based differential expression testing. We also show that spatial linear models with an exponential correlation structure provide a better fit to the ST data as compared to non-spatial models, particularly for spatially resolved technologies that quantify expression at finer scales (i.e., single-cell resolution).
Collapse
Affiliation(s)
- Oscar E Ospina
- Department of Biostatistics & Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Alex C Soupir
- Department of Biostatistics & Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | | | | | - Xiaoqing Yu
- Department of Biostatistics & Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics & Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
- Biostatistics and Epidemiology Core, Division of Health Services & Outcomes Research, Children's Mercy, Kansas City, MO, USA.
| |
Collapse
|
14
|
Yang J, Jiang X, Jin KW, Shin S, Li Q. Bayesian hidden mark interaction model for detecting spatially variable genes in imaging-based spatially resolved transcriptomics data. Front Genet 2024; 15:1356709. [PMID: 38725485 PMCID: PMC11079231 DOI: 10.3389/fgene.2024.1356709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 04/08/2024] [Indexed: 05/12/2024] Open
Abstract
Recent technology breakthroughs in spatially resolved transcriptomics (SRT) have enabled the comprehensive molecular characterization of cells whilst preserving their spatial and gene expression contexts. One of the fundamental questions in analyzing SRT data is the identification of spatially variable genes whose expressions display spatially correlated patterns. Existing approaches are built upon either the Gaussian process-based model, which relies on ad hoc kernels, or the energy-based Ising model, which requires gene expression to be measured on a lattice grid. To overcome these potential limitations, we developed a generalized energy-based framework to model gene expression measured from imaging-based SRT platforms, accommodating the irregular spatial distribution of measured cells. Our Bayesian model applies a zero-inflated negative binomial mixture model to dichotomize the raw count data, reducing noise. Additionally, we incorporate a geostatistical mark interaction model with a generalized energy function, where the interaction parameter is used to identify the spatial pattern. Auxiliary variable MCMC algorithms were employed to sample from the posterior distribution with an intractable normalizing constant. We demonstrated the strength of our method on both simulated and real data. Our simulation study showed that our method captured various spatial patterns with high accuracy; moreover, analysis of a seqFISH dataset and a STARmap dataset established that our proposed method is able to identify genes with novel and strong spatial patterns.
Collapse
Affiliation(s)
- Jie Yang
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX, United States
| | - Xi Jiang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, United States
| | - Kevin Wang Jin
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Sunyoung Shin
- Department of Mathematics, Pohang University of Science and Technology, Pohang, Republic of Korea
| | - Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX, United States
| |
Collapse
|
15
|
Guo X, Ning J, Chen Y, Liu G, Zhao L, Fan Y, Sun S. Recent advances in differential expression analysis for single-cell RNA-seq and spatially resolved transcriptomic studies. Brief Funct Genomics 2024; 23:95-109. [PMID: 37022699 DOI: 10.1093/bfgp/elad011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/09/2022] [Accepted: 03/10/2023] [Indexed: 04/07/2023] Open
Abstract
Differential expression (DE) analysis is a necessary step in the analysis of single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data. Unlike traditional bulk RNA-seq, DE analysis for scRNA-seq or SRT data has unique characteristics that may contribute to the difficulty of detecting DE genes. However, the plethora of DE tools that work with various assumptions makes it difficult to choose an appropriate one. Furthermore, a comprehensive review on detecting DE genes for scRNA-seq data or SRT data from multi-condition, multi-sample experimental designs is lacking. To bridge such a gap, here, we first focus on the challenges of DE detection, then highlight potential opportunities that facilitate further progress in scRNA-seq or SRT analysis, and finally provide insights and guidance in selecting appropriate DE tools or developing new computational DE methods.
Collapse
Affiliation(s)
- Xiya Guo
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Jin Ning
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yuanze Chen
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Guoliang Liu
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Liyan Zhao
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yue Fan
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Shiquan Sun
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| |
Collapse
|
16
|
Laubscher E, Wang X(J, Razin N, Dougherty T, Xu RJ, Ombelets L, Pao E, Graf W, Moffitt JR, Yue Y, Van Valen D. Accurate single-molecule spot detection for image-based spatial transcriptomics with weakly supervised deep learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.03.556122. [PMID: 37732188 PMCID: PMC10508757 DOI: 10.1101/2023.09.03.556122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
Image-based spatial transcriptomics methods enable transcriptome-scale gene expression measurements with spatial information but require complex, manually-tuned analysis pipelines. We present Polaris, an analysis pipeline for image-based spatial transcriptomics that combines deep learning models for cell segmentation and spot detection with a probabilistic gene decoder to quantify single-cell gene expression accurately. Polaris offers a unifying, turnkey solution for analyzing spatial transcriptomics data from MERFSIH, seqFISH, or ISS experiments. Polaris is available through the DeepCell software library (https://github.com/vanvalenlab/deepcell-spots) and https://www.deepcell.org.
Collapse
Affiliation(s)
- Emily Laubscher
- Division of Chemistry and Chemical Engineering, Caltech, Pasadena, CA
| | | | - Nitzan Razin
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA
| | - Tom Dougherty
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA
| | - Rosalind J. Xu
- Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston MA
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA
| | - Lincoln Ombelets
- Division of Chemistry and Chemical Engineering, Caltech, Pasadena, CA
| | - Edward Pao
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA
| | - William Graf
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA
| | - Jeffrey R. Moffitt
- Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston MA
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA
- Broad Institute of Harvard and MIT, Cambridge, MA
| | - Yisong Yue
- Division of Computational and Mathematical Sciences, Caltech, Pasadena, CA
| | - David Van Valen
- Division of Biology and Biological Engineering, Caltech, Pasadena, CA
- Howard Hughes Medical Institute, Chevy Chase, MD
| |
Collapse
|
17
|
Cai P, Robinson MD, Tiberi S. DESpace: spatially variable gene detection via differential expression testing of spatial clusters. Bioinformatics 2024; 40:btae027. [PMID: 38243704 PMCID: PMC10868334 DOI: 10.1093/bioinformatics/btae027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 12/23/2023] [Accepted: 01/15/2024] [Indexed: 01/21/2024] Open
Abstract
MOTIVATION Spatially resolved transcriptomics (SRT) enables scientists to investigate spatial context of mRNA abundance, including identifying spatially variable genes (SVGs), i.e. genes whose expression varies across the tissue. Although several methods have been proposed for this task, native SVG tools cannot jointly model biological replicates, or identify the key areas of the tissue affected by spatial variability. RESULTS Here, we introduce DESpace, a framework, based on an original application of existing methods, to discover SVGs. In particular, our approach inputs all types of SRT data, summarizes spatial information via spatial clusters, and identifies spatially variable genes by performing differential gene expression testing between clusters. Furthermore, our framework can identify (and test) the main cluster of the tissue affected by spatial variability; this allows scientists to investigate spatial expression changes in specific areas of interest. Additionally, DESpace enables joint modeling of multiple samples (i.e. biological replicates); compared to inference based on individual samples, this approach increases statistical power, and targets SVGs with consistent spatial patterns across replicates. Overall, in our benchmarks, DESpace displays good true positive rates, controls for false positive and false discovery rates, and is computationally efficient. AVAILABILITY AND IMPLEMENTATION DESpace is freely distributed as a Bioconductor R package at https://bioconductor.org/packages/DESpace.
Collapse
Affiliation(s)
- Peiying Cai
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich 8057, Switzerland
| | - Mark D Robinson
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich 8057, Switzerland
| | - Simone Tiberi
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich 8057, Switzerland
- Department of Statistical Sciences, University of Bologna, Bologna 40126, Italy
| |
Collapse
|
18
|
Zahedi R, Ghamsari R, Argha A, Macphillamy C, Beheshti A, Alizadehsani R, Lovell NH, Lotfollahi M, Alinejad-Rokny H. Deep learning in spatially resolved transcriptfomics: a comprehensive technical view. Brief Bioinform 2024; 25:bbae082. [PMID: 38483255 PMCID: PMC10939360 DOI: 10.1093/bib/bbae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/22/2024] [Accepted: 02/13/2024] [Indexed: 03/17/2024] Open
Abstract
Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives.
Collapse
Affiliation(s)
- Roxana Zahedi
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Reza Ghamsari
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Callum Macphillamy
- School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, 5371, Australia
| | - Amin Beheshti
- School of Computing, Macquarie University, Sydney, 2109, Australia
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, Melbourne, VIC, 3216, Australia
| | - Nigel H Lovell
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Hamid Alinejad-Rokny
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| |
Collapse
|
19
|
Yang J, Jiang X, Jin KW, Shin S, Li Q. Bayesian Hidden Mark Interaction Model for Detecting Spatially Variable Genes in Imaging-Based Spatially Resolved Transcriptomics Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.17.572071. [PMID: 38168368 PMCID: PMC10760150 DOI: 10.1101/2023.12.17.572071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Recent technology breakthroughs in spatially resolved transcriptomics (SRT) have enabled the comprehensive molecular characterization of cells whilst preserving their spatial and gene expression contexts. One of the fundamental questions in analyzing SRT data is the identification of spatially variable genes whose expressions display spatially correlated patterns. Existing approaches are built upon either the Gaussian process-based model, which relies on ad hoc kernels, or the energy-based Ising model, which requires gene expression to be measured on a lattice grid. To overcome these potential limitations, we developed a generalized energy-based framework to model gene expression measured from imaging-based SRT platforms, accommodating the irregular spatial distribution of measured cells. Our Bayesian model applies a zero-inflated negative binomial mixture model to dichotomize the raw count data, reducing noise. Additionally, we incorporate a geostatistical mark interaction model with a generalized energy function, where the interaction parameter is used to identify the spatial pattern. Auxiliary variable MCMC algorithms were employed to sample from the posterior distribution with an intractable normalizing constant. We demonstrated the strength of our method on both simulated and real data. Our simulation study showed that our method captured various spatial patterns with high accuracy; moreover, analysis of a seqFISH dataset and a STARmap dataset established that our proposed method is able to identify genes with novel and strong spatial patterns.
Collapse
Affiliation(s)
- Jie Yang
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas, U.S.A
| | - Xi Jiang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, Texas, U.S.A
| | - Kevin W. Jin
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, U.S.A
| | - Sunyoung Shin
- Department of Mathematics, Pohang University of Science and Technology, Pohang, South Korea
| | - Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas, U.S.A
| |
Collapse
|
20
|
Song S, Mohsin E, Zhang R, Kuznetsov A, Shen L, Grossman RL, Weber CR, Khan AA. ATAT: Automated Tissue Alignment and Traversal in Spatial Transcriptomics with Self-Supervised Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.08.570839. [PMID: 38106010 PMCID: PMC10723486 DOI: 10.1101/2023.12.08.570839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Spatial transcriptomics (ST) has enhanced RNA analysis in tissue biopsies, but interpreting these data is challenging without expert input. We present Automated Tissue Alignment and Traversal (ATAT), a novel computational framework designed to enhance ST analysis in the context of multiple and complex tissue architectures and morphologies, such as those found in biopsies of the gastrointestinal tract. ATAT utilizes self-supervised contrastive learning on hematoxylin and eosin (H&E) stained images to automate the alignment and traversal of ST data. This approach addresses a critical gap in current ST analysis methodologies, which rely heavily on manual annotation and pathologist expertise to delineate regions of interest for accurate gene expression modeling. Our framework not only streamlines the alignment of multiple ST samples, but also demonstrates robustness in modeling gene expression transitions across specific regions. Additionally, we highlight the ability of ATAT to traverse complex tissue topologies in real-world cases from various individuals and conditions. Our method successfully elucidates differences in immune infiltration patterns across the intestinal wall, enabling the modeling of transcriptional changes across histological layers. We show that ATAT achieves comparable performance to the state-of-the-art method, while alleviating the burden of manual annotation and enabling alignment of tissue samples with complex morphologies.
Collapse
Affiliation(s)
- Steven Song
- Department of Computer Science, University of Chicago, IL 60637, USA
- Interdisciplinary Scientist Training Program, University of Chicago, Chicago, IL 60637, USA
| | - Emaan Mohsin
- Department of Pathology, University of Chicago, Chicago, IL 60637, USA
| | - Renyu Zhang
- Department of Computer Science, University of Chicago, IL 60637, USA
| | - Andrey Kuznetsov
- Department of Pathology, University of Chicago, Chicago, IL 60637, USA
| | - Le Shen
- Department of Pathology, University of Chicago, Chicago, IL 60637, USA
| | - Robert L. Grossman
- Department of Computer Science, University of Chicago, IL 60637, USA
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | | | - Aly A. Khan
- Department of Pathology, University of Chicago, Chicago, IL 60637, USA
- Committee on Immunology, University of Chicago, Chicago, IL 60637, USA
- Institute for Population and Precision Health, University of Chicago, Chicago, IL 60637, USA
- Department of Family Medicine, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
21
|
Adhikari SD, Yang J, Wang J, Cui Y. A SELECTIVE REVIEW OF RECENT DEVELOPMENTS IN SPATIALLY VARIABLE GENE DETECTION FOR SPATIAL TRANSCRIPTOMICS. ARXIV 2023:arXiv:2311.13801v1. [PMID: 38045476 PMCID: PMC10690303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
With the emergence of advanced spatial transcriptomic technologies, there has been a surge in research papers dedicated to analyzing spatial transcriptomics data, resulting in significant contributions to our understanding of biology. The initial stage of downstream analysis of spatial transcriptomic data has centered on identifying spatially variable genes (SVGs) or genes expressed with specific spatial patterns across the tissue. SVG detection is an important task since many downstream analyses depend on these selected SVGs. Over the past few years, a plethora of new methods have been proposed for the detection of SVGs, accompanied by numerous innovative concepts and discussions. This article provides a selective review of methods and their practical implementations, offering valuable insights into the current literature in this field.
Collapse
Affiliation(s)
- Sikta Das Adhikari
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Jiaxin Yang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
22
|
Yuan Z, Yao J. Harnessing computational spatial omics to explore the spatial biology intricacies. Semin Cancer Biol 2023; 95:25-41. [PMID: 37400044 DOI: 10.1016/j.semcancer.2023.06.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 05/09/2023] [Accepted: 06/19/2023] [Indexed: 07/05/2023]
Abstract
Spatially resolved transcriptomics (SRT) has unlocked new dimensions in our understanding of intricate tissue architectures. However, this rapidly expanding field produces a wealth of diverse and voluminous data, necessitating the evolution of sophisticated computational strategies to unravel inherent patterns. Two distinct methodologies, gene spatial pattern recognition (GSPR) and tissue spatial pattern recognition (TSPR), have emerged as vital tools in this process. GSPR methodologies are designed to identify and classify genes exhibiting noteworthy spatial patterns, while TSPR strategies aim to understand intercellular interactions and recognize tissue domains with molecular and spatial coherence. In this review, we provide a comprehensive exploration of SRT, highlighting crucial data modalities and resources that are instrumental for the development of methods and biological insights. We address the complexities and challenges posed by the use of heterogeneous data in developing GSPR and TSPR methodologies and propose an optimal workflow for both. We delve into the latest advancements in GSPR and TSPR, examining their interrelationships. Lastly, we peer into the future, envisaging the potential directions and perspectives in this dynamic field.
Collapse
Affiliation(s)
- Zhiyuan Yuan
- Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
| | | |
Collapse
|
23
|
Zyla J, Papiez A, Zhao J, Qu R, Li X, Kluger Y, Polanska J, Hatzis C, Pusztai L, Marczyk M. Evaluation of zero counts to better understand the discrepancies between bulk and single-cell RNA-Seq platforms. Comput Struct Biotechnol J 2023; 21:4663-4674. [PMID: 37841335 PMCID: PMC10568495 DOI: 10.1016/j.csbj.2023.09.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/17/2023] Open
Abstract
Recent advances in sample preparation and sequencing technology have made it possible to profile the transcriptomes of individual cells using single-cell RNA sequencing (scRNA-Seq). Compared to bulk RNA-Seq data, single-cell data often contain a higher percentage of zero reads, mainly due to lower sequencing depth per cell, which affects mostly measurements of low-expression genes. However, discrepancies between platforms are observed regardless of expression level. Using four paired datasets with multiple samples each, we investigated technical and biological factors that can contribute to this expression shift. Using two separate machine learning models we found that, in addition to expression level, RNA integrity, gene or UTR3 length, and the number of transcripts potentially also influence the occurrence of zeros. These findings could enable the development of novel analytical methods for cross-platform expression shift correction. We also identified genes and biological pathways in our diverse datasets that consistently showed differences when assessed at the single cell versus bulk level to assist in interpreting analysis across transcriptomic platforms. At the gene level, 25 genes (0.12%) were found in all datasets as discordant, but at the pathway level, 7 pathways (2.02%) showed shared enrichment in discordant genes.
Collapse
Affiliation(s)
- Joanna Zyla
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice 44-100, Poland
| | - Anna Papiez
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice 44-100, Poland
| | - Jun Zhao
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT 06510, USA
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT 06510, USA
| | - Rihao Qu
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT 06510, USA
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT 06510, USA
| | - Xiaotong Li
- Breast Medical Oncology, Yale Cancer Center, Yale School of Medicine, New Haven, CT 06520, USA
| | - Yuval Kluger
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT 06510, USA
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT 06510, USA
- Applied Mathematics Program, Yale University, New Haven, CT, USA
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice 44-100, Poland
| | - Christos Hatzis
- Breast Medical Oncology, Yale Cancer Center, Yale School of Medicine, New Haven, CT 06520, USA
| | - Lajos Pusztai
- Breast Medical Oncology, Yale Cancer Center, Yale School of Medicine, New Haven, CT 06520, USA
| | - Michal Marczyk
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice 44-100, Poland
- Breast Medical Oncology, Yale Cancer Center, Yale School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
24
|
Liu Z, Wu D, Zhai W, Ma L. SONAR enables cell type deconvolution with spatially weighted Poisson-Gamma model for spatial transcriptomics. Nat Commun 2023; 14:4727. [PMID: 37550279 PMCID: PMC10406862 DOI: 10.1038/s41467-023-40458-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 07/25/2023] [Indexed: 08/09/2023] Open
Abstract
Recent advancements in spatial transcriptomic technologies have enabled the measurement of whole transcriptome profiles with preserved spatial context. However, limited by spatial resolution, the measured expressions at each spot are often from a mixture of multiple cells. Computational deconvolution methods designed for spatial transcriptomic data rarely make use of the valuable spatial information as well as the neighboring similarity information. Here, we propose SONAR, a Spatially weighted pOissoN-gAmma Regression model for cell-type deconvolution with spatial transcriptomic data. SONAR directly models the raw counts of spatial transcriptomic data and applies a geographically weighted regression framework that incorporates neighboring information to enhance local estimation of regional cell type composition. In addition, SONAR applies an additional elastic weighting step to adaptively filter dissimilar neighbors, which effectively prevents the introduction of local estimation bias in transition regions with sharp boundaries. We demonstrate the performance of SONAR over other state-of-the-art methods on synthetic data with various spatial patterns. We find that SONAR can accurately map region-specific cell types in real spatial transcriptomic data including mouse brain, human heart and human pancreatic ductal adenocarcinoma. We further show that SONAR can reveal the detailed distributions and fine-grained co-localization of immune cells within the microenvironment at the tumor-normal tissue margin in human liver cancer.
Collapse
Affiliation(s)
- Zhiyuan Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
- University of the Chinese Academy of Sciences, 100049, Beijing, China
| | - Dafei Wu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
| | - Weiwei Zhai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China.
- University of the Chinese Academy of Sciences, 100049, Beijing, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, 650223, Kunming, China.
| | - Liang Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China.
| |
Collapse
|
25
|
Cheng C, Chen W, Jin H, Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication. Cells 2023; 12:1970. [PMID: 37566049 PMCID: PMC10417635 DOI: 10.3390/cells12151970] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular biology at an unprecedented resolution, enabling the characterization of cellular heterogeneity, identification of rare but significant cell types, and exploration of cell-cell communications and interactions. Its broad applications span both basic and clinical research domains. In this comprehensive review, we survey the current landscape of scRNA-seq analysis methods and tools, focusing on count modeling, cell-type annotation, data integration, including spatial transcriptomics, and the inference of cell-cell communication. We review the challenges encountered in scRNA-seq analysis, including issues of sparsity or low expression, reliability of cell annotation, and assumptions in data integration, and discuss the potential impact of suboptimal clustering and differential expression analysis tools on downstream analyses, particularly in identifying cell subpopulations. Finally, we discuss recent advancements and future directions for enhancing scRNA-seq analysis. Specifically, we highlight the development of novel tools for annotating single-cell data, integrating and interpreting multimodal datasets covering transcriptomics, epigenomics, and proteomics, and inferring cellular communication networks. By elucidating the latest progress and innovation, we provide a comprehensive overview of the rapidly advancing field of scRNA-seq analysis.
Collapse
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| | - Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Hongjian Jin
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Xiang Chen
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| |
Collapse
|
26
|
Hu Y, Zhao Y, Schunk CT, Ma Y, Derr T, Zhou XM. ADEPT: Autoencoder with differentially expressed genes and imputation for robust spatial transcriptomics clustering. iScience 2023; 26:106792. [PMID: 37235055 PMCID: PMC10205785 DOI: 10.1016/j.isci.2023.106792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/06/2023] [Accepted: 04/26/2023] [Indexed: 05/28/2023] Open
Abstract
Advancements in spatial transcriptomics (ST) have enabled an in-depth understanding of complex tissues by quantifying gene expression at spatially localized spots. Several notable clustering methods have been introduced to utilize both spatial and transcriptional information in the analysis of ST datasets. However, data quality across different ST sequencing techniques and types of datasets influence the performance of different methods and benchmarks. To harness spatial context and transcriptional profile in ST data, we developed a graph-based, multi-stage framework for robust clustering, called ADEPT. To control and stabilize data quality, ADEPT relies on a graph autoencoder backbone and performs an iterative clustering on imputed, differentially expressed genes-based matrices to minimize the variance of clustering results. ADEPT outperformed other popular methods on ST data generated by different platforms across analyses such as spatial domain identification, visualization, spatial trajectory inference, and data denoising.
Collapse
Affiliation(s)
- Yunfei Hu
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Yuying Zhao
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Curtis T. Schunk
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Yingxiang Ma
- Data Science Institute, Vanderbilt University, Nashville, TN, USA
| | - Tyler Derr
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
- Data Science Institute, Vanderbilt University, Nashville, TN, USA
| | - Xin Maizie Zhou
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
- Data Science Institute, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
27
|
Zhu J, Shang L, Zhou X. SRTsim: spatial pattern preserving simulations for spatially resolved transcriptomics. Genome Biol 2023; 24:39. [PMID: 36869394 PMCID: PMC9983268 DOI: 10.1186/s13059-023-02879-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 02/16/2023] [Indexed: 03/05/2023] Open
Abstract
Spatially resolved transcriptomics (SRT)-specific computational methods are often developed, tested, validated, and evaluated in silico using simulated data. Unfortunately, existing simulated SRT data are often poorly documented, hard to reproduce, or unrealistic. Single-cell simulators are not directly applicable for SRT simulation as they cannot incorporate spatial information. We present SRTsim, an SRT-specific simulator for scalable, reproducible, and realistic SRT simulations. SRTsim not only maintains various expression characteristics of SRT data but also preserves spatial patterns. We illustrate the benefits of SRTsim in benchmarking methods for spatial clustering, spatial expression pattern detection, and cell-cell communication identification.
Collapse
Affiliation(s)
- Jiaqiang Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Lulu Shang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
28
|
Gan D, Li J. SCIBER: a simple method for removing batch effects from single-cell RNA-sequencing data. Bioinformatics 2023; 39:6957084. [PMID: 36548380 PMCID: PMC9848058 DOI: 10.1093/bioinformatics/btac819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/27/2022] [Accepted: 12/21/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Integrative analysis of multiple single-cell RNA-sequencing datasets allows for more comprehensive characterizations of cell types, but systematic technical differences between datasets, known as 'batch effects', need to be removed before integration to avoid misleading interpretation of the data. Although many batch-effect-removal methods have been developed, there is still a large room for improvement: most existing methods only give dimension-reduced data instead of expression data of individual genes, are based on computationally demanding models and are black-box models and thus difficult to interpret or tune. RESULTS Here, we present a new batch-effect-removal method called SCIBER (Single-Cell Integrator and Batch Effect Remover) and study its performance on real datasets. SCIBER matches cell clusters across batches according to the overlap of their differentially expressed genes. As a simple algorithm that has better scalability to data with a large number of cells and is easy to tune, SCIBER shows comparable and sometimes better accuracy in removing batch effects on real datasets compared to the state-of-the-art methods, which are much more complicated. Moreover, SCIBER outputs expression data in the original space, that is, the expression of individual genes, which can be used directly for downstream analyses. Additionally, SCIBER is a reference-based method, which assigns one of the batches as the reference batch and keeps it untouched during the process, making it especially suitable for integrating user-generated datasets with standard reference data such as the Human Cell Atlas. AVAILABILITY AND IMPLEMENTATION SCIBER is publicly available as an R package on CRAN: https://cran.r-project.org/web/packages/SCIBER/. A vignette is included in the CRAN R package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dailin Gan
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Jun Li
- To whom correspondence should be addressed.
| |
Collapse
|
29
|
Ospina O, Soupir A, Fridley BL. A Primer on Preprocessing, Visualization, Clustering, and Phenotyping of Barcode-Based Spatial Transcriptomics Data. Methods Mol Biol 2023; 2629:115-140. [PMID: 36929076 DOI: 10.1007/978-1-0716-2986-4_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Recent developments in spatially resolved transcriptomics (ST) have resulted in a large number of studies characterizing the architecture of tissues, the spatial distribution of cell types, and their interactions. Furthermore, ST promises to enable the discovery of more accurate drug targets while also providing a better understanding of the etiology and evolution of complex diseases. The analysis of ST brings similar challenges as seen in other gene expression assays such as scRNA-seq; however, there is the additional spatial information that warrants the development of suitable algorithms for the quality control, preprocessing, visualization, and other discovery-enabling approaches (e.g., clustering, cell phenotyping). In this chapter, we review some of the existing algorithms to perform these analytical tasks and highlight some of the unmet analytical challenges in the analysis of ST data. Given the diversity of available ST technologies, we focus this chapter on the analysis of barcode-based RNA quantitation techniques.
Collapse
Affiliation(s)
- Oscar Ospina
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Alex Soupir
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
30
|
Liu J, Tran V, Vemuri VNP, Byrne A, Borja M, Kim YJ, Agarwal S, Wang R, Awayan K, Murti A, Taychameekiatchai A, Wang B, Emanuel G, He J, Haliburton J, Oliveira Pisco A, Neff NF. Concordance of MERFISH spatial transcriptomics with bulk and single-cell RNA sequencing. Life Sci Alliance 2023; 6:e202201701. [PMID: 36526371 PMCID: PMC9760489 DOI: 10.26508/lsa.202201701] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 09/27/2022] [Accepted: 09/28/2022] [Indexed: 12/23/2022] Open
Abstract
Spatial transcriptomics extends single-cell RNA sequencing (scRNA-seq) by providing spatial context for cell type identification and analysis. Imaging-based spatial technologies such as multiplexed error-robust fluorescence in situ hybridization (MERFISH) can achieve single-cell resolution, directly mapping single-cell identities to spatial positions. MERFISH produces a different data type than scRNA-seq, and a technical comparison between the two modalities is necessary to ascertain how to best integrate them. We performed MERFISH on the mouse liver and kidney and compared the resulting bulk and single-cell RNA statistics with those from the Tabula Muris Senis cell atlas and from two Visium datasets. MERFISH quantitatively reproduced the bulk RNA-seq and scRNA-seq results with improvements in overall dropout rates and sensitivity. Finally, we found that MERFISH independently resolved distinct cell types and spatial structure in both the liver and kidney. Computational integration with the Tabula Muris Senis atlas did not enhance these results. We conclude that MERFISH provides a quantitatively comparable method for single-cell gene expression and can identify cell types without the need for computational integration with scRNA-seq atlases.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Ruofan Wang
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Kyle Awayan
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Abhishek Murti
- School of Medicine, University of California, San Francisco, CA, USA
| | | | - Bruce Wang
- School of Medicine, University of California, San Francisco, CA, USA
| | | | | | | | | | | |
Collapse
|