1
|
Zheng H, Sarkar H, Raphael BJ. Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.17.638195. [PMID: 40027720 PMCID: PMC11870578 DOI: 10.1101/2025.02.17.638195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, while the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of ad-mixed expression from technologies with lower spatial resolution. We introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint non-negative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from ad-mixed SRT data. In simulations involving paired SRT datasets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell-type-specific gene expression, and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression. A PyTorch implementation of SIID is available at https://github.com/raphael-group/siid .
Collapse
Affiliation(s)
- Hongyu Zheng
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Hirak Sarkar
- Department of Computer Science, Princeton University, Princeton, NJ, USA
- Ludwig Cancer Institute, Princeton Branch, Princeton University, Princeton, NJ, USA
| | | |
Collapse
|
2
|
Kim N, Park S, Jo A, Eum HH, Kim HK, Lee K, Cho JH, Ku BM, Jung HA, Sun JM, Lee SH, Ahn JS, Lee JI, Choi JW, Jeong D, Na M, Kang H, Kim JY, Choi JK, Lee HO, Ahn MJ. Unveiling the influence of tumor and immune signatures on immune checkpoint therapy in advanced lung cancer. eLife 2024; 13:RP98366. [PMID: 39514276 PMCID: PMC11548875 DOI: 10.7554/elife.98366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024] Open
Abstract
This study investigates the variability among patients with non-small cell lung cancer (NSCLC) in their responses to immune checkpoint inhibitors (ICIs). Recognizing that patients with advanced-stage NSCLC rarely qualify for surgical interventions, it becomes crucial to identify biomarkers that influence responses to ICI therapy. We conducted an analysis of single-cell transcriptomes from 33 lung cancer biopsy samples, with a particular focus on 14 core samples taken before the initiation of palliative ICI treatment. Our objective was to link tumor and immune cell profiles with patient responses to ICI. We discovered that ICI non-responders exhibited a higher presence of CD4+ regulatory T cells, resident memory T cells, and TH17 cells. This contrasts with the diverse activated CD8+ T cells found in responders. Furthermore, tumor cells in non-responders frequently showed heightened transcriptional activity in the NF-kB and STAT3 pathways, suggesting a potential inherent resistance to ICI therapy. Through the integration of immune cell profiles and tumor molecular signatures, we achieved an discriminative power (area under the curve [AUC]) exceeding 95% in identifying patient responses to ICI treatment. These results underscore the crucial importance of the interplay between tumor and immune microenvironment, including within metastatic sites, in affecting the effectiveness of ICIs in NSCLC.
Collapse
Affiliation(s)
- Nayoung Kim
- Department of Microbiology, College of Medicine, The Catholic University of KoreaSeoulRepublic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of KoreaSeoulRepublic of Korea
| | - Sehhoon Park
- Division of Haematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Areum Jo
- Department of Microbiology, College of Medicine, The Catholic University of KoreaSeoulRepublic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of KoreaSeoulRepublic of Korea
| | - Hye Hyeon Eum
- Department of Microbiology, College of Medicine, The Catholic University of KoreaSeoulRepublic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of KoreaSeoulRepublic of Korea
| | - Hong Kwan Kim
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Kyungjong Lee
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Jong Ho Cho
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Bo Mi Ku
- Research Institute for Future Medicine, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Hyun Ae Jung
- Division of Haematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Jong-Mu Sun
- Division of Haematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Se-Hoon Lee
- Division of Haematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Jin Seok Ahn
- Division of Haematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Jung-Il Lee
- Department of Neurosurgery, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Jung Won Choi
- Department of Neurosurgery, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| | - Dasom Jeong
- Department of Microbiology, College of Medicine, The Catholic University of KoreaSeoulRepublic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of KoreaSeoulRepublic of Korea
| | - Minsu Na
- Department of Microbiology, College of Medicine, The Catholic University of KoreaSeoulRepublic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of KoreaSeoulRepublic of Korea
| | - Huiram Kang
- Department of Microbiology, College of Medicine, The Catholic University of KoreaSeoulRepublic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of KoreaSeoulRepublic of Korea
| | - Jeong Yeon Kim
- Department of Bio and Brain Engineering, KAISTDaejeonRepublic of Korea
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAISTDaejeonRepublic of Korea
| | - Hae-Ock Lee
- Department of Microbiology, College of Medicine, The Catholic University of KoreaSeoulRepublic of Korea
- Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of KoreaSeoulRepublic of Korea
- Precision Medicine Research Center, College of Medicine, The Catholic University of KoreaSeoulRepublic of Korea
| | - Myung-Ju Ahn
- Division of Haematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of MedicineSeoulRepublic of Korea
| |
Collapse
|
3
|
Zhao K, So HC, Lin Z. scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis. Genome Biol 2024; 25:223. [PMID: 39152499 PMCID: PMC11328435 DOI: 10.1186/s13059-024-03345-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 07/23/2024] [Indexed: 08/19/2024] Open
Abstract
The rapid rise in the availability and scale of scRNA-seq data needs scalable methods for integrative analysis. Though many methods for data integration have been developed, few focus on understanding the heterogeneous effects of biological conditions across different cell populations in integrative analysis. Our proposed scalable approach, scParser, models the heterogeneous effects from biological conditions, which unveils the key mechanisms by which gene expression contributes to phenotypes. Notably, the extended scParser pinpoints biological processes in cell subpopulations that contribute to disease pathogenesis. scParser achieves favorable performance in cell clustering compared to state-of-the-art methods and has a broad and diverse applicability.
Collapse
Affiliation(s)
- Kai Zhao
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Hon-Cheong So
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Institute of Zoology and The Chinese University of Hong Kong, Hong Kong SAR, China.
- Department of Psychiatry, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
- Margaret K.L. Cheung Research Centre for Management of Parkinsonism, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
- Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
- Hong Kong Branch of the Chinese Academy of Sciences Center for Excellence in Animal Evolution and Genetics, The Chinese University of Hong Kong, Hong Kong SAR, China.
| | - Zhixiang Lin
- Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
| |
Collapse
|
4
|
Ozturk K, Panwala R, Sheen J, Ford K, Jayne N, Portell A, Zhang DE, Hutter S, Haferlach T, Ideker T, Mali P, Carter H. Interface-guided phenotyping of coding variants in the transcription factor RUNX1. Cell Rep 2024; 43:114436. [PMID: 38968069 PMCID: PMC11345852 DOI: 10.1016/j.celrep.2024.114436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 07/07/2024] Open
Abstract
Single-gene missense mutations remain challenging to interpret. Here, we deploy scalable functional screening by sequencing (SEUSS), a Perturb-seq method, to generate mutations at protein interfaces of RUNX1 and quantify their effect on activities of downstream cellular programs. We evaluate single-cell RNA profiles of 115 mutations in myelogenous leukemia cells and categorize them into three functionally distinct groups, wild-type (WT)-like, loss-of-function (LoF)-like, and hypomorphic, that we validate in orthogonal assays. LoF-like variants dominate the DNA-binding site and are recurrent in cancer; however, recurrence alone does not predict functional impact. Hypomorphic variants share characteristics with LoF-like but favor protein interactions, promoting gene expression indicative of nerve growth factor (NGF) response and cytokine recruitment of neutrophils. Accessible DNA near differentially expressed genes frequently contains RUNX1-binding motifs. Finally, we reclassify 16 variants of uncertain significance and train a classifier to predict 103 more. Our work demonstrates the potential of targeting protein interactions to better define the landscape of phenotypes reachable by missense mutations.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Rebecca Panwala
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Jeanna Sheen
- School of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Kyle Ford
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Nathan Jayne
- School of Biological Sciences, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Andrew Portell
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Dong-Er Zhang
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Stephan Hutter
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 Munich, Germany
| | - Torsten Haferlach
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 Munich, Germany
| | - Trey Ideker
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Prashant Mali
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
5
|
Bilous M, Hérault L, Gabriel AA, Teleman M, Gfeller D. Building and analyzing metacells in single-cell genomics data. Mol Syst Biol 2024; 20:744-766. [PMID: 38811801 PMCID: PMC11220014 DOI: 10.1038/s44320-024-00045-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
The advent of high-throughput single-cell genomics technologies has fundamentally transformed biological sciences. Currently, millions of cells from complex biological tissues can be phenotypically profiled across multiple modalities. The scaling of computational methods to analyze and visualize such data is a constant challenge, and tools need to be regularly updated, if not redesigned, to cope with ever-growing numbers of cells. Over the last few years, metacells have been introduced to reduce the size and complexity of single-cell genomics data while preserving biologically relevant information and improving interpretability. Here, we review recent studies that capitalize on the concept of metacells-and the many variants in nomenclature that have been used. We further outline how and when metacells should (or should not) be used to analyze single-cell genomics data and what should be considered when analyzing such data at the metacell level. To facilitate the exploration of metacells, we provide a comprehensive tutorial on the construction and analysis of metacells from single-cell RNA-seq data ( https://github.com/GfellerLab/MetacellAnalysisTutorial ) as well as a fully integrated pipeline to rapidly build, visualize and evaluate metacells with different methods ( https://github.com/GfellerLab/MetacellAnalysisToolkit ).
Collapse
Affiliation(s)
- Mariia Bilous
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Aurélie Ag Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Matei Teleman
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland.
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland.
| |
Collapse
|
6
|
Zhang W, Yu R, Xu Z, Li J, Gao W, Jiang M, Dai Q. scCompressSA: dual-channel self-attention based deep autoencoder model for single-cell clustering by compressing gene-gene interactions. BMC Genomics 2024; 25:423. [PMID: 38684946 PMCID: PMC11059774 DOI: 10.1186/s12864-024-10286-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/04/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Single-cell clustering has played an important role in exploring the molecular mechanisms about cell differentiation and human diseases. Due to highly-stochastic transcriptomics data, accurate detection of cell types is still challenged, especially for RNA-sequencing data from human beings. In this case, deep neural networks have been increasingly employed to mine cell type specific patterns and have outperformed statistic approaches in cell clustering. RESULTS Using cross-correlation to capture gene-gene interactions, this study proposes the scCompressSA method to integrate topological patterns from scRNA-seq data, with support of self-attention (SA) based coefficient compression (CC) block. This SA-based CC block is able to extract and employ static gene-gene interactions from scRNA-seq data. This proposed scCompressSA method has enhanced clustering accuracy in multiple benchmark scRNA-seq datasets by integrating topological and temporal features. CONCLUSION Static gene-gene interactions have been extracted as temporal features to boost clustering performance in single-cell clustering For the scCompressSA method, dual-channel SA based CC block is able to integrate topological features and has exhibited extraordinary detection accuracy compared with previous clustering approaches that only employ temporal patterns.
Collapse
Affiliation(s)
- Wei Zhang
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Ruochen Yu
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Zeqi Xu
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Junnan Li
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Wenhao Gao
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China
| | - Mingfeng Jiang
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China.
| | - Qi Dai
- Zhejiang Sci-Tech University, Second Street 928, Hangzhou, Zhejiang, 310018, China.
| |
Collapse
|
7
|
Liu R, Qian K, He X, Li H. Integration of scRNA-seq data by disentangled representation learning with condition domain adaptation. BMC Bioinformatics 2024; 25:116. [PMID: 38493095 PMCID: PMC10944609 DOI: 10.1186/s12859-024-05706-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
BACKGROUND The integration of single-cell RNA sequencing data from multiple experimental batches and diverse biological conditions holds significant importance in the study of cellular heterogeneity. RESULTS To expedite the exploration of systematic disparities under various biological contexts, we propose a scRNA-seq integration method called scDisco, which involves a domain-adaptive decoupling representation learning strategy for the integration of dissimilar single-cell RNA data. It constructs a condition-specific domain-adaptive network founded on variational autoencoders. scDisco not only effectively reduces batch effects but also successfully disentangles biological effects and condition-specific effects, and further augmenting condition-specific representations through the utilization of condition-specific Domain-Specific Batch Normalization layers. This enhancement enables the identification of genes specific to particular conditions. The effectiveness and robustness of scDisco as an integration method were analyzed using both simulated and real datasets, and the results demonstrate that scDisco can yield high-quality visualizations and quantitative outcomes. Furthermore, scDisco has been validated using real datasets, affirming its proficiency in cell clustering quality, retaining batch-specific cell types and identifying condition-specific genes. CONCLUSION scDisco is an effective integration method based on variational autoencoders, which improves analytical tasks of reducing batch effects, cell clustering, retaining batch-specific cell types and identifying condition-specific genes.
Collapse
Affiliation(s)
- Renjing Liu
- School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan, 430074, China
| | - Kun Qian
- School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan, 430074, China
| | - Xinwei He
- School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan, 430074, China
| | - Hongwei Li
- School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan, 430074, China.
| |
Collapse
|
8
|
Fan Z, Sun J, Thorpe H, Lee S, Kim S, Park HJ. Deep neural network learning biological condition information refines gene-expression-based cell subtypes. Brief Bioinform 2023; 25:bbad512. [PMID: 38233089 PMCID: PMC10794113 DOI: 10.1093/bib/bbad512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 11/18/2023] [Accepted: 12/05/2023] [Indexed: 01/19/2024] Open
Abstract
With the recent advent of single-cell level biological understanding, a growing interest is in identifying cell states or subtypes that are homogeneous in terms of gene expression and are also enriched in certain biological conditions, including disease samples versus normal samples (condition-specific cell subtype). Despite the importance of identifying condition-specific cell subtypes, existing methods have the following limitations: since they train models separately between gene expression and the biological condition information, (1) they do not consider potential interactions between them, and (2) the weights from both types of information are not properly controlled. Also, (3) they do not consider non-linear relationships in the gene expression and the biological condition. To address the limitations and accurately identify such condition-specific cell subtypes, we develop scDeepJointClust, the first method that jointly trains both types of information via a deep neural network. scDeepJointClust incorporates results from the power of state-of-the-art gene-expression-based clustering methods as an input, incorporating their sophistication and accuracy. We evaluated scDeepJointClust on both simulation data in diverse scenarios and biological data of different diseases (melanoma and non-small-cell lung cancer) and showed that scDeepJointClust outperforms existing methods in terms of sensitivity and specificity. scDeepJointClust exhibits significant promise in advancing our understanding of cellular states and their implications in complex biological systems.
Collapse
Affiliation(s)
- Zhenjiang Fan
- Department of Computer Science, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
| | - Jie Sun
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
| | - Henry Thorpe
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
| | - Stephen Lee
- Department of Computer Science, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
| | - Soyeon Kim
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hyun Jung Park
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
9
|
Liu J, Kreimer A, Li WV. Differential variability analysis of single-cell gene expression data. Brief Bioinform 2023; 24:bbad294. [PMID: 37598422 PMCID: PMC10516347 DOI: 10.1093/bib/bbad294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/18/2023] [Accepted: 07/29/2023] [Indexed: 08/22/2023] Open
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technologies has enabled gene expression profiling at the single-cell resolution, thereby enabling the quantification and comparison of transcriptional variability among individual cells. Although alterations in transcriptional variability have been observed in various biological states, statistical methods for quantifying and testing differential variability between groups of cells are still lacking. To identify the best practices in differential variability analysis of single-cell gene expression data, we propose and compare 12 statistical pipelines using different combinations of methods for normalization, feature selection, dimensionality reduction and variability calculation. Using high-quality synthetic scRNA-seq datasets, we benchmarked the proposed pipelines and found that the most powerful and accurate pipeline performs simple library size normalization, retains all genes in analysis and uses denSNE-based distances to cluster medoids as the variability measure. By applying this pipeline to scRNA-seq datasets of COVID-19 and autism patients, we have identified cellular variability changes between patients with different severity status or between patients and healthy controls.
Collapse
Affiliation(s)
- Jiayi Liu
- Graduate Programs in Molecular Biosciences, Rutgers, The State University of New Jersey, 604 Allison Rd, Piscataway, 08854, NJ, USA
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, 08854, NJ, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, 08854, NJ, USA
| | - Anat Kreimer
- Department of Biochemistry and Molecular Biology, Rutgers, The State University of New Jersey, 604 Allison Road, Piscataway, 08854, NJ, USA
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, 679 Hoes Lane West, Piscataway, Piscataway, 08854, NJ, USA
| | - Wei Vivian Li
- Department of Statistics, University of California, Riverside, 900 University Ave, Riverside, 92521, CA, USA
- Previous affiliation where part of the work was completed: Department of Biostatistics and Epidemiology, Rutgers, The State University of New Jersey, 683 Hoes Lane West, Piscataway, 08854, NJ, USA
| |
Collapse
|
10
|
Ozturk K, Panwala R, Sheen J, Ford K, Payne N, Zhang DE, Hutter S, Haferlach T, Ideker T, Mali P, Carter H. Interface-guided phenotyping of coding variants in the transcription factor RUNX1 with SEUSS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.03.551876. [PMID: 37577681 PMCID: PMC10418284 DOI: 10.1101/2023.08.03.551876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. Here we deploy SEUSS, a Perturb-seq like approach, to generate and assay mutations at physical interfaces of the RUNX1 Runt domain. We measured the impact of 115 mutations on RNA profiles in single myelogenous leukemia cells and used the profiles to categorize mutations into three functionally distinct groups: wild-type (WT)-like, loss-of-function (LOF)-like and hypomorphic. Notably, the largest concentration of functional mutations (non-WT-like) clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Hypomorphic variants shared characteristics with loss of function variants but had gene expression profiles indicative of response to neural growth factor and cytokine recruitment of neutrophils. Additionally, DNA accessibility changes upon perturbations were enriched for RUNX1 binding motifs, particularly near differentially expressed genes. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.
Collapse
|
11
|
Zhang Z, Sun H, Mariappan R, Chen X, Chen X, Jain MS, Efremova M, Teichmann SA, Rajan V, Zhang X. scMoMaT jointly performs single cell mosaic integration and multi-modal bio-marker detection. Nat Commun 2023; 14:384. [PMID: 36693837 PMCID: PMC9873790 DOI: 10.1038/s41467-023-36066-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 01/13/2023] [Indexed: 01/26/2023] Open
Abstract
Single cell data integration methods aim to integrate cells across data batches and modalities, and data integration tasks can be categorized into horizontal, vertical, diagonal, and mosaic integration, where mosaic integration is the most general and challenging case with few methods developed. We propose scMoMaT, a method that is able to integrate single cell multi-omics data under the mosaic integration scenario using matrix tri-factorization. During integration, scMoMaT is also able to uncover the cluster specific bio-markers across modalities. These multi-modal bio-markers are used to interpret and annotate the clusters to cell types. Moreover, scMoMaT can integrate cell batches with unequal cell type compositions. Applying scMoMaT to multiple real and simulated datasets demonstrated these features of scMoMaT and showed that scMoMaT has superior performance compared to existing methods. Specifically, we show that integrated cell embedding combined with learned bio-markers lead to cell type annotations of higher quality or resolution compared to their original annotations.
Collapse
Affiliation(s)
- Ziqi Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Haoran Sun
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA, USA
| | - Ragunathan Mariappan
- Department of Information Systems and Analytics, National University of Singapore, Singapore, Singapore
| | - Xi Chen
- Department of Biology, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Xinyu Chen
- Bioengineering Program, Georgia Institute of Technology, Atlanta, GA, USA
| | | | | | | | - Vaibhav Rajan
- Department of Information Systems and Analytics, National University of Singapore, Singapore, Singapore
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
12
|
Qian K, Fu S, Li H, Li WV. The scINSIGHT Package for Integrating Single-Cell RNA-Seq Data from Different Biological Conditions. J Comput Biol 2022; 29:1233-1236. [PMID: 35920848 PMCID: PMC9700338 DOI: 10.1089/cmb.2022.0244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Data integration is a critical step in the analysis of multiple single-cell RNA sequencing samples to account for heterogeneity due to both biological and technical variability. scINSIGHT is a new integration method for single-cell gene expression data, and can effectively use the information of biological condition to improve the integration of multiple single-cell samples. scINSIGHT is based on a novel non-negative matrix factorization model that learns common and condition-specific gene modules in samples from different biological or experimental conditions. Using these gene modules, scINSIGHT can further identify cellular identities and active biological processes in different cell types or conditions. Here we introduce the installation and main functionality of the scINSIGHT R package, including how to preprocess the data, apply the scINSIGHT algorithm, and analyze the output.
Collapse
Affiliation(s)
- Kun Qian
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
| | - Shiwei Fu
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Department of Statistics, University of California, Riverside, California, USA
| | - Hongwei Li
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
| | - Wei Vivian Li
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Department of Statistics, University of California, Riverside, California, USA
| |
Collapse
|
13
|
Qian K, Fu S, Li H, Li WV. Author Correction: scINSIGHT for interpreting single-cell gene expression from biologically heterogeneous data. Genome Biol 2022; 23:104. [PMID: 35449066 PMCID: PMC9027830 DOI: 10.1186/s13059-022-02672-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Kun Qian
- School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, Hubei, China
| | - Shiwei Fu
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Hongwei Li
- School of Mathematics and Physics, China University of Geosciences, Wuhan, 430074, Hubei, China
| | - Wei Vivian Li
- Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|