1
|
Liu Y, Yang C. Computational methods for alignment and integration of spatially resolved transcriptomics data. Comput Struct Biotechnol J 2024; 23:1094-1105. [PMID: 38495555 PMCID: PMC10940867 DOI: 10.1016/j.csbj.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 03/02/2024] [Accepted: 03/04/2024] [Indexed: 03/19/2024] Open
Abstract
Most of the complex biological regulatory activities occur in three dimensions (3D). To better analyze biological processes, it is essential not only to decipher the molecular information of numerous cells but also to understand how their spatial contexts influence their behavior. With the development of spatially resolved transcriptomics (SRT) technologies, SRT datasets are being generated to simultaneously characterize gene expression and spatial arrangement information within tissues, organs or organisms. To fully leverage spatial information, the focus extends beyond individual two-dimensional (2D) slices. Two tasks known as slices alignment and data integration have been introduced to establish correlations between multiple slices, enhancing the effectiveness of downstream tasks. Currently, numerous related methods have been developed. In this review, we first elucidate the details and principles behind several representative methods. Then we report the testing results of these methods on various SRT datasets, and assess their performance in representative downstream tasks. Insights into the strengths and weaknesses of each method and the reasons behind their performance are discussed. Finally, we provide an outlook on future developments. The codes and details of experiments are now publicly available at https://github.com/YangLabHKUST/SRT_alignment_and_integration.
Collapse
Affiliation(s)
- Yuyao Liu
- Department of Automation, School of Information Science and Technology, Tsinghua University, Beijing, China
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China
| |
Collapse
|
2
|
Lin S, Cui Y, Zhao F, Yang Z, Song J, Yao J, Zhao Y, Qian BZ, Zhao Y, Yuan Z. Complete spatially resolved gene expression is not necessary for identifying spatial domains. CELL GENOMICS 2024; 4:100565. [PMID: 38781966 DOI: 10.1016/j.xgen.2024.100565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/29/2024] [Accepted: 04/30/2024] [Indexed: 05/25/2024]
Abstract
Spatially resolved transcriptomics (SRT) technologies have revolutionized the study of tissue organization. We introduce a graph convolutional network with an attention and positive emphasis mechanism, termed BINARY, relying exclusively on binarized SRT data to accurately delineate spatial domains. BINARY outperforms existing methods across various SRT data types while using significantly less input information. Our study suggests that precise gene expression quantification may not always be essential, inspiring further exploration of the broader applications of spatially resolved binarized gene expression data.
Collapse
Affiliation(s)
- Senlin Lin
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yan Cui
- Institute of Science and Technology for Brain-Inspired Intelligence, MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China; Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China
| | - Fangyuan Zhao
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Zhidong Yang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
| | | | - Yu Zhao
- AI Lab, Tencent, Shenzhen, China
| | - Bin-Zhi Qian
- Fudan University Shanghai Cancer Center, Department of Oncology, Shanghai Medical College, The Human Phenome Institute, Zhangjiang-Fudan International Innovation Center, Fudan University, Shanghai, China
| | - Yi Zhao
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.
| | - Zhiyuan Yuan
- Institute of Science and Technology for Brain-Inspired Intelligence, MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China; Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China.
| |
Collapse
|
3
|
Swain AK, Pandit V, Sharma J, Yadav P. SpatialPrompt: spatially aware scalable and accurate tool for spot deconvolution and domain identification in spatial transcriptomics. Commun Biol 2024; 7:639. [PMID: 38796505 PMCID: PMC11127982 DOI: 10.1038/s42003-024-06349-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/17/2024] [Indexed: 05/28/2024] Open
Abstract
Efficiently mapping of cell types in situ remains a major challenge in spatial transcriptomics. Most spot deconvolution tools ignore spatial coordinate information and perform extremely slow on large datasets. Here, we introduce SpatialPrompt, a spatially aware and scalable tool for spot deconvolution and domain identification. SpatialPrompt integrates gene expression, spatial location, and single-cell RNA sequencing (scRNA-seq) dataset as reference to accurately infer cell-type proportions of spatial spots. SpatialPrompt uses non-negative ridge regression and graph neural network to efficiently capture local microenvironment information. Our extensive benchmarking analysis on Visium, Slide-seq, and MERFISH datasets demonstrated superior performance of SpatialPrompt over 15 existing tools. On mouse hippocampus dataset, SpatialPrompt achieves spot deconvolution and domain identification within 2 minutes for 50,000 spots. Overall, domain identification using SpatialPrompt was 44 to 150 times faster than existing methods. We build a database housing 40 plus curated scRNA-seq datasets for seamless integration with SpatialPrompt for spot deconvolution.
Collapse
Affiliation(s)
- Asish Kumar Swain
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, Rajasthan, 342030, India
| | - Vrushali Pandit
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, Rajasthan, 342030, India
| | - Jyoti Sharma
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, Rajasthan, 342030, India
| | - Pankaj Yadav
- Department of Bioscience & Bioengineering, Indian Institute of Technology, Jodhpur, Rajasthan, 342030, India.
- School of Artificial Intelligence and Data Science, Indian Institute of Technology, Jodhpur, Rajasthan, 342030, India.
| |
Collapse
|
4
|
Lu Y, Chen QM, An L. SPADE: spatial deconvolution for domain specific cell-type estimation. Commun Biol 2024; 7:469. [PMID: 38632414 PMCID: PMC11024133 DOI: 10.1038/s42003-024-06172-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
Understanding gene expression in different cell types within their spatial context is a key goal in genomics research. SPADE (SPAtial DEconvolution), our proposed method, addresses this by integrating spatial patterns into the analysis of cell type composition. This approach uses a combination of single-cell RNA sequencing, spatial transcriptomics, and histological data to accurately estimate the proportions of cell types in various locations. Our analyses of synthetic data have demonstrated SPADE's capability to discern cell type-specific spatial patterns effectively. When applied to real-life datasets, SPADE provides insights into cellular dynamics and the composition of tumor tissues. This enhances our comprehension of complex biological systems and aids in exploring cellular diversity. SPADE represents a significant advancement in deciphering spatial gene expression patterns, offering a powerful tool for the detailed investigation of cell types in spatial transcriptomics.
Collapse
Affiliation(s)
- Yingying Lu
- Interdisciplinary Program in Statistics and Data Science, University of Arizona, Tucson, AZ, 85721, USA
| | - Qin M Chen
- College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA
| | - Lingling An
- Interdisciplinary Program in Statistics and Data Science, University of Arizona, Tucson, AZ, 85721, USA.
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, 85721, USA.
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
5
|
Yuan Z, Zhao F, Lin S, Zhao Y, Yao J, Cui Y, Zhang XY, Zhao Y. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat Methods 2024; 21:712-722. [PMID: 38491270 DOI: 10.1038/s41592-024-02215-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 02/16/2024] [Indexed: 03/18/2024]
Abstract
Spatial clustering, which shares an analogy with single-cell clustering, has expanded the scope of tissue physiology studies from cell-centroid to structure-centroid with spatially resolved transcriptomics (SRT) data. Computational methods have undergone remarkable development in recent years, but a comprehensive benchmark study is still lacking. Here we present a benchmark study of 13 computational methods on 34 SRT data (7 datasets). The performance was evaluated on the basis of accuracy, spatial continuity, marker genes detection, scalability, and robustness. We found existing methods were complementary in terms of their performance and functionality, and we provide guidance for selecting appropriate methods for given scenarios. On testing additional 22 challenging datasets, we identified challenges in identifying noncontinuous spatial domains and limitations of existing methods, highlighting their inadequacies in handling recent large-scale tasks. Furthermore, with 145 simulated data, we examined the robustness of these methods against four different factors, and assessed the impact of pre- and postprocessing approaches. Our study offers a comprehensive evaluation of existing spatial clustering methods with SRT data, paving the way for future advancements in this rapidly evolving field.
Collapse
Affiliation(s)
- Zhiyuan Yuan
- Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China.
- Institute of Science and Technology for Brain-Inspired Intelligence; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence; MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
| | - Fangyuan Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Senlin Lin
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yu Zhao
- Tencent AI Lab, Shenzhen, China
| | | | - Yan Cui
- Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China
- Institute of Science and Technology for Brain-Inspired Intelligence; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence; MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Xiao-Yong Zhang
- Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China
| | - Yi Zhao
- Research Center for Ubiquitous Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
6
|
Lin S, Zhao F, Wu Z, Yao J, Zhao Y, Yuan Z. Streamlining spatial omics data analysis with Pysodb. Nat Protoc 2024; 19:831-895. [PMID: 38135744 DOI: 10.1038/s41596-023-00925-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 10/02/2023] [Indexed: 12/24/2023]
Abstract
Advances in spatial omics technologies have improved the understanding of cellular organization in tissues, leading to the generation of complex and heterogeneous data and prompting the development of specialized tools for managing, loading and visualizing spatial omics data. The Spatial Omics Database (SODB) was established to offer a unified format for data storage and interactive visualization modules. Here we detail the use of Pysodb, a Python-based tool designed to enable the efficient exploration and loading of spatial datasets from SODB within a Python environment. We present seven case studies using Pysodb, detailing the interaction with various computational methods, ensuring reproducibility of experimental data and facilitating the integration of new data and alternative applications in SODB. The approach offers a reference for method developers by outlining label and metadata availability in representative spatial data that can be loaded by Pysodb. The tool is supplemented by a website ( https://protocols-pysodb.readthedocs.io/ ) with detailed information for benchmarking analysis, and allows method developers to focus on computational models by facilitating data processing. This protocol is designed for researchers with limited experience in computational biology. Depending on the dataset complexity, the protocol typically requires ~12 h to complete.
Collapse
Affiliation(s)
- Senlin Lin
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Fangyuan Zhao
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | | | | | - Yi Zhao
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | - Zhiyuan Yuan
- Institute of Science and Technology for Brain-Inspired Intelligence, MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
- Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, China.
| |
Collapse
|
7
|
Liu T, Li K, Wang Y, Li H, Zhao H. Evaluating the Utilities of Foundation Models in Single-cell Data Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.08.555192. [PMID: 38464157 PMCID: PMC10925156 DOI: 10.1101/2023.09.08.555192] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Foundation Models (FMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of FMs in single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. By comparing ten different single-cell FMs with task-specific methods, we found that single-cell FMs may not consistently excel in all tasks than task-specific methods. However, the emergent abilities and the successful applications of cross-species/cross-modality transfer learning of FMs are promising. In addition, we present a systematic evaluation of the effects of hyper-parameters, initial settings, and stability for training single-cell FMs based on a proposed scEval framework, and provide guidelines for pre-training and fine-tuning. Our work summarizes the current state of single-cell FMs and points to their constraints and avenues for future development.
Collapse
|
8
|
Yang S, Zhou X. SRT-Server: powering the analysis of spatial transcriptomic data. Genome Med 2024; 16:18. [PMID: 38279156 PMCID: PMC10811909 DOI: 10.1186/s13073-024-01288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 01/15/2024] [Indexed: 01/28/2024] Open
Abstract
BACKGROUND Spatial resolved transcriptomics (SRT) encompasses a rapidly developing set of technologies that enable the measurement of gene expression in tissue while retaining spatial localization information. SRT technologies and the enabled SRT studies have provided unprecedent insights into the structural and functional underpinnings of complex tissues. As SRT technologies have advanced and an increasing number of SRT studies have emerged, numerous sophisticated statistical and computational methods have been developed to facilitate the analysis and interpretation of SRT data. However, despite the growing popularity of SRT studies and the widespread availability of SRT analysis methods, analysis of large-scale and complex SRT datasets remains challenging and not easily accessible to researchers with limited statistical and computational backgrounds. RESULTS Here, we present SRT-Server, the first webserver designed to carry out comprehensive SRT analyses for a wide variety of SRT technologies while requiring minimal prior computational knowledge. Implemented with cutting-edge web development technologies, SRT-Server is user-friendly and features multiple analytic modules that can perform a range of SRT analyses. With a flowchart-style interface, these different analytic modules on the SRT-Server can be dragged into the main panel and connected to each other to create custom analytic pipelines. SRT-Server then automatically executes the desired analyses, generates corresponding figures, and outputs results-all without requiring prior programming knowledge. We demonstrate the advantages of SRT-Server through three case studies utilizing SRT data collected from two common platforms, highlighting its versatility and values to researchers with varying analytic expertise. CONCLUSIONS Overall, SRT-Server presents a user-friendly, efficient, effective, secure, and expandable solution for SRT data analysis, opening new doors for researchers in the field. SRT-Server is freely available at https://spatialtranscriptomicsanalysis.com/ .
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, 211166, China.
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
9
|
Liang Y, Shi G, Cai R, Yuan Y, Xie Z, Yu L, Huang Y, Shi Q, Wang L, Li J, Tang Z. PROST: quantitative identification of spatially variable genes and domain detection in spatial transcriptomics. Nat Commun 2024; 15:600. [PMID: 38238417 PMCID: PMC10796707 DOI: 10.1038/s41467-024-44835-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 12/19/2023] [Indexed: 01/22/2024] Open
Abstract
Computational methods have been proposed to leverage spatially resolved transcriptomic data, pinpointing genes with spatial expression patterns and delineating tissue domains. However, existing approaches fall short in uniformly quantifying spatially variable genes (SVGs). Moreover, from a methodological viewpoint, while SVGs are naturally associated with depicting spatial domains, they are technically dissociated in most methods. Here, we present a framework (PROST) for the quantitative recognition of spatial transcriptomic patterns, consisting of (i) quantitatively characterizing spatial variations in gene expression patterns through the PROST Index; and (ii) unsupervised clustering of spatial domains via a self-attention mechanism. We demonstrate that PROST performs superior SVG identification and domain segmentation with various spatial resolutions, from multicellular to cellular levels. Importantly, PROST Index can be applied to prioritize spatial expression variations, facilitating the exploration of biological insights. Together, our study provides a flexible and robust framework for analyzing diverse spatial transcriptomic data.
Collapse
Affiliation(s)
- Yuchen Liang
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Guowei Shi
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Runlin Cai
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Yuchen Yuan
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Ziying Xie
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Long Yu
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Yingjian Huang
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Qian Shi
- School of Geography and Planning, Sun Yat-sen University, Guangzhou, 510275, China
| | - Lizhe Wang
- School of Computer Science, China University of Geosciences, Wuhan, 430078, China
| | - Jun Li
- School of Computer Science, China University of Geosciences, Wuhan, 430078, China.
| | - Zhonghui Tang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China.
| |
Collapse
|
10
|
Yuan Z. MENDER: fast and scalable tissue structure identification in spatial omics data. Nat Commun 2024; 15:207. [PMID: 38182575 PMCID: PMC10770058 DOI: 10.1038/s41467-023-44367-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 12/11/2023] [Indexed: 01/07/2024] Open
Abstract
Tissue structure identification is a crucial task in spatial omics data analysis, for which increasingly complex models, such as Graph Neural Networks and Bayesian networks, are employed. However, whether increased model complexity can effectively lead to improved performance is a notable question in the field. Inspired by the consistent observation of cellular neighborhood structures across various spatial technologies, we propose Multi-range cEll coNtext DEciphereR (MENDER), for tissue structure identification. Applied on datasets of 3 brain regions and a whole-brain atlas, MENDER, with biology-driven design, offers substantial improvements over modern complex models while automatically aligning labels across slices, despite using much less running time than the second-fastest. MENDER's identification power allows the uncovering of previously overlooked spatial domains that exhibit strong associations with brain aging. MENDER's scalability makes it freely appliable on a million-level brain spatial atlas. MENDER's discriminative power enables the differentiation of breast cancer patient subtypes obscured by single-cell analysis.
Collapse
Affiliation(s)
- Zhiyuan Yuan
- Institute of Science and Technology for Brain-Inspired Intelligence, MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, MOE Frontiers Center for Brain Science, Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Fudan University, Shanghai, 200433, China.
| |
Collapse
|
11
|
Li J, Wang J, Lin Z. SGCAST: symmetric graph convolutional auto-encoder for scalable and accurate study of spatial transcriptomics. Brief Bioinform 2023; 25:bbad490. [PMID: 38171928 PMCID: PMC10782917 DOI: 10.1093/bib/bbad490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/02/2023] [Accepted: 12/07/2023] [Indexed: 01/05/2024] Open
Abstract
Recent advances in spatial transcriptomics (ST) have enabled comprehensive profiling of gene expression with spatial information in the context of the tissue microenvironment. However, with the improvements in the resolution and scale of ST data, deciphering spatial domains precisely while ensuring efficiency and scalability is still challenging. Here, we develop SGCAST, an efficient auto-encoder framework to identify spatial domains. SGCAST adopts a symmetric graph convolutional auto-encoder to learn aggregated latent embeddings via integrating the gene expression similarity and the proximity of the spatial spots. This framework in SGCAST enables a mini-batch training strategy, which makes SGCAST memory-efficient and scalable to high-resolution spatial transcriptomic data with a large number of spots. SGCAST improves the overall accuracy of spatial domain identification on benchmarking data. We also validated the performance of SGCAST on ST datasets at various scales across multiple platforms. Our study illustrates the superior capacity of SGCAST on analyzing spatial transcriptomic data.
Collapse
Affiliation(s)
- Jinzhao Li
- Department of Statistics, The Chinese University of Hong Kong, Sha Tin, Hong Kong, China
| | - Jiong Wang
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Zhixiang Lin
- Department of Statistics, The Chinese University of Hong Kong, Sha Tin, Hong Kong, China
| |
Collapse
|
12
|
Guo T, Yuan Z, Pan Y, Wang J, Chen F, Zhang MQ, Li X. SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol 2023; 24:241. [PMID: 37864231 PMCID: PMC10590036 DOI: 10.1186/s13059-023-03078-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/29/2023] [Indexed: 10/22/2023] Open
Abstract
Properly integrating spatially resolved transcriptomics (SRT) generated from different batches into a unified gene-spatial coordinate system could enable the construction of a comprehensive spatial transcriptome atlas. Here, we propose SPIRAL, consisting of two consecutive modules: SPIRAL-integration, with graph domain adaptation-based data integration, and SPIRAL-alignment, with cluster-aware optimal transport-based coordination alignment. We verify SPIRAL with both synthetic and real SRT datasets. By encoding spatial correlations to gene expressions, SPIRAL-integration surpasses state-of-the-art methods in both batch effect removal and joint spatial domain identification. By aligning spots cluster-wise, SPIRAL-alignment achieves more accurate coordinate alignments than existing methods.
Collapse
Affiliation(s)
- Tiantian Guo
- School of Software Engineering, Beijing Jiaotong University, Beijing, 100044, China
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Zhiyuan Yuan
- Institute of Science and Technology for Brain-Inspired Intelligence, Center for Medical Research and Innovation, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China
| | - Yan Pan
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Jiakang Wang
- School of Software Engineering, Beijing Jiaotong University, Beijing, 100044, China
| | - Fengling Chen
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Michael Q Zhang
- Department of Biological Sciences, Center for Systems Biology, The University of Texas, Richardson, TX, 75080-3021, USA.
| | - Xiangyu Li
- School of Software Engineering, Beijing Jiaotong University, Beijing, 100044, China.
| |
Collapse
|
13
|
Lu Y, Chen Q, An L. SPADE: Spatial Deconvolution for Domain Specific Cell-type Estimation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.14.536924. [PMID: 37131788 PMCID: PMC10153127 DOI: 10.1101/2023.04.14.536924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The advent of spatial transcriptomics technology has allowed for the acquisition of gene expression profiles with multi-cellular resolution in a spatially resolved manner, presenting a new milestone in the field of genomics. However, the aggregate gene expression from heterogeneous cell types obtained by these technologies poses a significant challenge for a comprehensive delineation of cell type-specific spatial patterns. Here, we propose SPADE (SPAtial DEconvolution), an in-silico method designed to address this challenge by incorporating spatial patterns during cell type decomposition. SPADE utilizes a combination of single-cell RNA sequencing data, spatial location information, and histological information to computationally estimate the proportion of cell types present at each spatial location. In our study, we showcased the effectiveness of SPADE by conducting analyses on synthetic data. Our results indicated that SPADE was able to successfully identify cell type-specific spatial patterns that were not previously identified by existing deconvolution methods. Furthermore, we applied SPADE to a real-world dataset analyzing the developmental chicken heart, where we observed that SPADE was able to accurately capture the intricate processes of cellular differentiation and morphogenesis within the heart. Specifically, we were able to reliably estimate changes in cell type compositions over time, which is a critical aspect of understanding the underlying mechanisms of complex biological systems. These findings underscore the potential of SPADE as a valuable tool for analyzing complex biological systems and shedding light on their underlying mechanisms. Taken together, our results suggest that SPADE represents a significant advancement in the field of spatial transcriptomics, providing a powerful tool for characterizing complex spatial gene expression patterns in heterogeneous tissues.
Collapse
|
14
|
Zhu J, Shang L, Zhou X. SRTsim: spatial pattern preserving simulations for spatially resolved transcriptomics. Genome Biol 2023; 24:39. [PMID: 36869394 PMCID: PMC9983268 DOI: 10.1186/s13059-023-02879-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 02/16/2023] [Indexed: 03/05/2023] Open
Abstract
Spatially resolved transcriptomics (SRT)-specific computational methods are often developed, tested, validated, and evaluated in silico using simulated data. Unfortunately, existing simulated SRT data are often poorly documented, hard to reproduce, or unrealistic. Single-cell simulators are not directly applicable for SRT simulation as they cannot incorporate spatial information. We present SRTsim, an SRT-specific simulator for scalable, reproducible, and realistic SRT simulations. SRTsim not only maintains various expression characteristics of SRT data but also preserves spatial patterns. We illustrate the benefits of SRTsim in benchmarking methods for spatial clustering, spatial expression pattern detection, and cell-cell communication identification.
Collapse
Affiliation(s)
- Jiaqiang Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Lulu Shang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
15
|
Zhang X, Liu W, Song F, Liu J. iSC.MEB: an R package for multi-sample spatial clustering analysis of spatial transcriptomics data. BIOINFORMATICS ADVANCES 2023; 3:vbad019. [PMID: 36845201 PMCID: PMC9945056 DOI: 10.1093/bioadv/vbad019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/24/2022] [Accepted: 02/16/2023] [Indexed: 02/19/2023]
Abstract
Summary Emerging spatially resolved transcriptomics (SRT) technologies are powerful in measuring gene expression profiles while retaining tissue spatial localization information and typically provide data from multiple tissue sections. We have previously developed the tool SC.MEB-an empirical Bayes approach for SRT data analysis using a hidden Markov random field. Here, we introduce an extension to SC.MEB, denoted as integrated spatial clustering with hidden Markov random field using empirical Bayes (iSC.MEB) that permits the users to simultaneously estimate the batch effect and perform spatial clustering for low-dimensional representations of multiple SRT datasets. We demonstrate that iSC.MEB can provide accurate cell/domain detection results using two SRT datasets. Availability and implementation iSC.MEB is implemented in an open-source R package, and source code is freely available at https://github.com/XiaoZhangryy/iSC.MEB. Documentation and vignettes are provided on our package website (https://xiaozhangryy.github.io/iSC.MEB/index.html). Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Xiao Zhang
- Centre for Quantitative Medicine Health Services & Systems Research, Duke-NUS Medical School, 169857 Singapore, Singapore
| | - Wei Liu
- Centre for Quantitative Medicine Health Services & Systems Research, Duke-NUS Medical School, 169857 Singapore, Singapore
| | - Fangda Song
- School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen 518172, Guangdong, China
| | - Jin Liu
- To whom correspondence should be addressed.
| |
Collapse
|
16
|
Jeon H, Xie J, Jeon Y, Jung KJ, Gupta A, Chang W, Chung D. Statistical Power Analysis for Designing Bulk, Single-Cell, and Spatial Transcriptomics Experiments: Review, Tutorial, and Perspectives. Biomolecules 2023; 13:biom13020221. [PMID: 36830591 PMCID: PMC9952882 DOI: 10.3390/biom13020221] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 01/20/2023] [Accepted: 01/21/2023] [Indexed: 01/26/2023] Open
Abstract
Gene expression profiling technologies have been used in various applications such as cancer biology. The development of gene expression profiling has expanded the scope of target discovery in transcriptomic studies, and each technology produces data with distinct characteristics. In order to guarantee biologically meaningful findings using transcriptomic experiments, it is important to consider various experimental factors in a systematic way through statistical power analysis. In this paper, we review and discuss the power analysis for three types of gene expression profiling technologies from a practical standpoint, including bulk RNA-seq, single-cell RNA-seq, and high-throughput spatial transcriptomics. Specifically, we describe the existing power analysis tools for each research objective for each of the bulk RNA-seq and scRNA-seq experiments, along with recommendations. On the other hand, since there are no power analysis tools for high-throughput spatial transcriptomics at this point, we instead investigate the factors that can influence power analysis.
Collapse
Affiliation(s)
- Hyeongseon Jeon
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Juan Xie
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
- The Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH 43210, USA
| | - Yeseul Jeon
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- Department of Statistics and Data Science, Yonsei University, Seoul 03722, Republic of Korea
- Department of Applied Statistics, Yonsei University, Seoul 03722, Republic of Korea
| | - Kyeong Joo Jung
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA
| | - Arkobrato Gupta
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
- The Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH 43210, USA
| | - Won Chang
- Division of Statistics and Data Science, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Dongjun Chung
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
- The Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH 43210, USA
- Correspondence:
| |
Collapse
|
17
|
Liu W, Liao X, Luo Z, Yang Y, Lau MC, Jiao Y, Shi X, Zhai W, Ji H, Yeong J, Liu J. Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nat Commun 2023; 14:296. [PMID: 36653349 PMCID: PMC9849443 DOI: 10.1038/s41467-023-35947-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 01/09/2023] [Indexed: 01/19/2023] Open
Abstract
Spatially resolved transcriptomics involves a set of emerging technologies that enable the transcriptomic profiling of tissues with the physical location of expressions. Although a variety of methods have been developed for data integration, most of them are for single-cell RNA-seq datasets without consideration of spatial information. Thus, methods that can integrate spatial transcriptomics data from multiple tissue slides, possibly from multiple individuals, are needed. Here, we present PRECAST, a data integration method for multiple spatial transcriptomics datasets with complex batch effects and/or biological effects between slides. PRECAST unifies spatial factor analysis simultaneously with spatial clustering and embedding alignment, while requiring only partially shared cell/domain clusters across datasets. Using both simulated and four real datasets, we show improved cell/domain detection with outstanding visualization, and the estimated aligned embeddings and cell/domain labels facilitate many downstream analyses. We demonstrate that PRECAST is computationally scalable and applicable to spatial transcriptomics datasets from different platforms.
Collapse
Affiliation(s)
- Wei Liu
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Xu Liao
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Ziye Luo
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore
- School of Statistics, Renmin University, Beijing, China
| | - Yi Yang
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Mai Chan Lau
- Institute of Molecular and Cell Biology (IMCB), Agency of Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Yuling Jiao
- School of Mathematics and Statistics, Wuhan University, Wuhan, China
| | - Xingjie Shi
- Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, China
| | - Weiwei Zhai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Hongkai Ji
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Joe Yeong
- Institute of Molecular and Cell Biology (IMCB), Agency of Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Anatomical Pathology, Singapore General Hospital, Singapore, Singapore
| | - Jin Liu
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore.
- School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen, China.
| |
Collapse
|
18
|
A Framework for Comparison and Assessment of Synthetic RNA-Seq Data. Genes (Basel) 2022; 13:genes13122362. [PMID: 36553629 PMCID: PMC9778097 DOI: 10.3390/genes13122362] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 12/16/2022] Open
Abstract
The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest.
Collapse
|
19
|
Spatially aware dimension reduction for spatial transcriptomics. Nat Commun 2022; 13:7203. [PMID: 36418351 PMCID: PMC9684472 DOI: 10.1038/s41467-022-34879-1] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 11/10/2022] [Indexed: 11/27/2022] Open
Abstract
Spatial transcriptomics are a collection of genomic technologies that have enabled transcriptomic profiling on tissues with spatial localization information. Analyzing spatial transcriptomic data is computationally challenging, as the data collected from various spatial transcriptomic technologies are often noisy and display substantial spatial correlation across tissue locations. Here, we develop a spatially-aware dimension reduction method, SpatialPCA, that can extract a low dimensional representation of the spatial transcriptomics data with biological signal and preserved spatial correlation structure, thus unlocking many existing computational tools previously developed in single-cell RNAseq studies for tailored analysis of spatial transcriptomics. We illustrate the benefits of SpatialPCA for spatial domain detection and explores its utility for trajectory inference on the tissue and for high-resolution spatial map construction. In the real data applications, SpatialPCA identifies key molecular and immunological signatures in a detected tumor surrounding microenvironment, including a tertiary lymphoid structure that shapes the gradual transcriptomic transition during tumorigenesis and metastasis. In addition, SpatialPCA detects the past neuronal developmental history that underlies the current transcriptomic landscape across tissue locations in the cortex.
Collapse
|
20
|
Yu Q, Jiang M, Wu L. Spatial transcriptomics technology in cancer research. Front Oncol 2022; 12:1019111. [PMID: 36313703 PMCID: PMC9606570 DOI: 10.3389/fonc.2022.1019111] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/21/2022] [Indexed: 08/25/2023] Open
Abstract
In recent years, spatial transcriptomics (ST) technologies have developed rapidly and have been widely used in constructing spatial tissue atlases and characterizing spatiotemporal heterogeneity of cancers. Currently, ST has been used to profile spatial heterogeneity in multiple cancer types. Besides, ST is a benefit for identifying and comprehensively understanding special spatial areas such as tumor interface and tertiary lymphoid structures (TLSs), which exhibit unique tumor microenvironments (TMEs). Therefore, ST has also shown great potential to improve pathological diagnosis and identify novel prognostic factors in cancer. This review presents recent advances and prospects of applications on cancer research based on ST technologies as well as the challenges.
Collapse
Affiliation(s)
- Qichao Yu
- Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Miaomiao Jiang
- Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China
| | - Liang Wu
- Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China
| |
Collapse
|