1
|
Liu T, Liu Z, Sun W, Shankar A, Zhao Y, Wang X. M-band wavelet-based multi-view clustering of cells. PLoS Comput Biol 2025; 21:e1013060. [PMID: 40408513 DOI: 10.1371/journal.pcbi.1013060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 04/18/2025] [Indexed: 05/25/2025] Open
Abstract
Wavelet analysis has been recognized as a widely used and promising tool in the fields of signal processing and data analysis. However, the application of wavelet-based method in single-cell RNA sequencing (scRNA-seq) data is little known. Here, we present M-band wavelet-based scRNA-seq multi-view clustering of cells (WMC). We applied for integration of M-band wavelet analysis and uniform manifold approximation and projection (UMAP) to a panel of single cell sequencing datasets by breaking up the data matrix into an approximation or low resolution component and M-1 detail or high resolution components. Our method is armed with multi-view clustering of cell types, identity, and functional states, enabling missing cell types visualization and new cell types discovery. Distinct to standard scRNA-seq workflow, our wavelet-based approach is a new addition to uncover rare cell types with a fine resolution.
Collapse
Affiliation(s)
- Tong Liu
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
| | - Zihuan Liu
- Data and Statistical Science, AbbVie, Chicago, Illinois, United States of America
| | - Wenke Sun
- School of Economics and Management, Dalian University of Technology, Dalian, China
| | - Adeethyia Shankar
- Brown University, Providence, Rhode Island, United States of America
| | - Yongzhong Zhao
- Frontage Labs, Exton, Pennsylvania, United States of America
| | - Xiaodi Wang
- Department of Mathematics, Western Connecticut State University, Danbury, Connecticut, United States of America
| |
Collapse
|
2
|
Hsieh CH, Chen YX, Tseng TY, Li A, Huang HC, Juan HF. Transcriptionally distinct malignant neuroblastoma populations show selective response to adavosertib treatment. Neurotherapeutics 2025; 22:e00575. [PMID: 40118716 PMCID: PMC12047484 DOI: 10.1016/j.neurot.2025.e00575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Accepted: 03/08/2025] [Indexed: 03/23/2025] Open
Abstract
Neuroblastoma is an aggressive childhood cancer that arises from the sympathetic nervous system. Despite advances in treatment, high-risk neuroblastoma remains difficult to manage due to its heterogeneous nature and frequent development of drug resistance. Drug repurposing guided by single-cell analysis presents a promising strategy for identifying new therapeutic options. Here, we aim to characterize high-risk neuroblastoma subpopulations and identify effective repurposed drugs for targeted treatment. We performed single-cell transcriptomic analysis of neuroblastoma samples, integrating bulk RNA-seq data deconvolution with clinical outcomes to define distinct malignant cell states. Using a systematic drug repurposing pipeline, we identified and validated potential therapeutic agents targeting specific high-risk neuroblastoma subpopulations. Single-cell analysis revealed 17 transcriptionally distinct neuroblastoma subpopulations. Survival analysis identified a highly aggressive subpopulation characterized by elevated UBE2C/PTTG1 expression and poor patient outcomes, distinct from a less aggressive subpopulation with favorable prognosis. Drug repurposing screening identified Adavosertib as particularly effective against the aggressive subpopulation, validated using SK-N-DZ cells as a representative model. Mechanistically, Adavosertib suppressed cell proliferation through AKT/mTOR pathway disruption, induced G2/M phase cell cycle arrest, and promoted apoptosis. Further analysis revealed UBE2C and PTTG1 as key molecular drivers of drug resistance, where their overexpression enhanced proliferation, Adavosertib resistance, and cell migration. This study establishes a single-cell-based drug repurposing strategy for high-risk neuroblastoma treatment. Our approach successfully identified Adavosertib as a promising repurposed therapeutic agent for targeting specific high-risk neuroblastoma subpopulations, providing a framework for developing more effective personalized treatment strategies.
Collapse
Affiliation(s)
- Chiao-Hui Hsieh
- Department of Life Science, National Taiwan University, Taipei, Taiwan
| | - Yi-Xuan Chen
- Department of Life Science, National Taiwan University, Taipei, Taiwan
| | - Tzu-Yang Tseng
- Department of Life Science, National Taiwan University, Taipei, Taiwan
| | - Albert Li
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Hsuan-Cheng Huang
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan.
| | - Hsueh-Fen Juan
- Department of Life Science, National Taiwan University, Taipei, Taiwan; Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan; Center for Computational and Systems Biology, National Taiwan University, Taipei, Taiwan; Center for Advanced Computing and Imaging in Biomedicine, Taipei, Taiwan.
| |
Collapse
|
3
|
Hackenberg M, Brunn N, Vogel T, Binder H. Infusing structural assumptions into dimensionality reduction for single-cell RNA sequencing data to identify small gene sets. Commun Biol 2025; 8:414. [PMID: 40069486 PMCID: PMC11897155 DOI: 10.1038/s42003-025-07872-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 03/03/2025] [Indexed: 03/15/2025] Open
Abstract
Dimensionality reduction greatly facilitates the exploration of cellular heterogeneity in single-cell RNA sequencing data. While most of such approaches are data-driven, it can be useful to incorporate biologically plausible assumptions about the underlying structure or the experimental design. We propose the boosting autoencoder (BAE) approach, which combines the advantages of unsupervised deep learning for dimensionality reduction and boosting for formalizing assumptions. Specifically, our approach selects small sets of genes that explain latent dimensions. As illustrative applications, we explore the diversity of neural cell identities and temporal patterns of embryonic development.
Collapse
Grants
- Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 322977937, GRK 2344
- Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 322977937, GRK 2344 ; Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 499552394, SFB 1597
- Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 322977937, GRK 2344; Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 499552394, SFB 1597
Collapse
Affiliation(s)
- Maren Hackenberg
- Institute of Medical Biometry and Statistics (IMBI), Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany.
- Freiburg Center for Data Analysis, Modeling and AI, University of Freiburg, Freiburg, Germany.
| | - Niklas Brunn
- Institute of Medical Biometry and Statistics (IMBI), Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany.
- Freiburg Center for Data Analysis, Modeling and AI, University of Freiburg, Freiburg, Germany.
| | - Tanja Vogel
- Institute of Anatomy and Cell Biology, Department Molecular Embryology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics (IMBI), Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis, Modeling and AI, University of Freiburg, Freiburg, Germany
- Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg, Germany
| |
Collapse
|
4
|
Dong S, Cui Z, Liu D, Lei J. scRDiT: Generating Single-cell RNA-seq Data by Diffusion Transformers and Accelerating Sampling. Interdiscip Sci 2025:10.1007/s12539-025-00688-5. [PMID: 39982678 DOI: 10.1007/s12539-025-00688-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 01/07/2025] [Accepted: 01/08/2025] [Indexed: 02/22/2025]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual datasets that share analogous statistical properties. Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual scRNA-seq data by leveraging a real dataset. The method is a neural network constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the real dataset through iterative noise-adding steps and ultimately restoring the noises to form scRNA-seq samples. This scheme allows us to learn data features from actual scRNA-seq samples during model training. Our experiments, conducted on two distinct scRNA-seq datasets, demonstrate superior performance. Additionally, the model sampling process is expedited by incorporating Denoising Diffusion Implicit Models (DDIMs). scRDiT presents a unified methodology empowering users to train neural network models with their unique scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq samples.
Collapse
Affiliation(s)
- Shengze Dong
- School of Computer Science and Technology, Tiangong University, Tianjin, 300387, China
| | - Zhuorui Cui
- School of Computer Science and Technology, Tiangong University, Tianjin, 300387, China
| | - Ding Liu
- School of Computer Science and Technology, Tiangong University, Tianjin, 300387, China.
| | - Jinzhi Lei
- School of Mathematical Sciences, Tiangong University, Tianjin, 300387, China.
| |
Collapse
|
5
|
Chockalingam SP, Aluru M, Aluru S. SCEMENT: scalable and memory efficient integration of large-scale single-cell RNA-sequencing data. Bioinformatics 2025; 41:btaf057. [PMID: 39985442 PMCID: PMC12013815 DOI: 10.1093/bioinformatics/btaf057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 11/18/2024] [Accepted: 02/20/2025] [Indexed: 02/24/2025] Open
Abstract
MOTIVATION Integrative analysis of large-scale single-cell data collected from diverse cell populations promises an improved understanding of complex biological systems. While several algorithms have been developed for single-cell RNA-sequencing data integration, many lack the scalability to handle large numbers of datasets and/or millions of cells due to their memory and run time requirements. The few tools that can handle large data do so by reducing the computational burden through strategies such as subsampling of the data or selecting a reference dataset to improve computational efficiency and scalability. Such shortcuts, however, hamper the accuracy of downstream analyses, especially those requiring quantitative gene expression information. RESULTS We present SCEMENT, a SCalablE and Memory-Efficient iNTegration method, to overcome these limitations. Our new parallel algorithm builds upon and extends the linear regression model previously applied in ComBat to an unsupervised sparse matrix setting to enable accurate integration of diverse and large collections of single-cell RNA-sequencing data. Using tens to hundreds of real single-cell RNA-seq datasets, we show that SCEMENT outperforms ComBat as well as FastIntegration and Scanorama in runtime (upto 214× faster) and memory usage (upto 17.5× less). It not only performs batch correction and integration of millions of cells in under 25 min, but also facilitates the discovery of new rare cell types and more robust reconstruction of gene regulatory networks with full quantitative gene expression information. AVAILABILITY AND IMPLEMENTATION Source code freely available for download at https://github.com/AluruLab/scement, implemented in C++ and supported on Linux.
Collapse
Affiliation(s)
- Sriram P Chockalingam
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA-30332, United States
| | - Maneesha Aluru
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA-30332, United States
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA-30332, United States
| |
Collapse
|
6
|
Gulati GS, D'Silva JP, Liu Y, Wang L, Newman AM. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics. Nat Rev Mol Cell Biol 2025; 26:11-31. [PMID: 39169166 DOI: 10.1038/s41580-024-00768-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/16/2024] [Indexed: 08/23/2024]
Abstract
Single-cell transcriptomics has broadened our understanding of cellular diversity and gene expression dynamics in healthy and diseased tissues. Recently, spatial transcriptomics has emerged as a tool to contextualize single cells in multicellular neighbourhoods and to identify spatially recurrent phenotypes, or ecotypes. These technologies have generated vast datasets with targeted-transcriptome and whole-transcriptome profiles of hundreds to millions of cells. Such data have provided new insights into developmental hierarchies, cellular plasticity and diverse tissue microenvironments, and spurred a burst of innovation in computational methods for single-cell analysis. In this Review, we discuss recent advancements, ongoing challenges and prospects in identifying and characterizing cell states and multicellular neighbourhoods. We discuss recent progress in sample processing, data integration, identification of subtle cell states, trajectory modelling, deconvolution and spatial analysis. Furthermore, we discuss the increasing application of deep learning, including foundation models, in analysing single-cell and spatial transcriptomics data. Finally, we discuss recent applications of these tools in the fields of stem cell biology, immunology, and tumour biology, and the future of single-cell and spatial transcriptomics in biological research and its translation to the clinic.
Collapse
Affiliation(s)
- Gunsagar S Gulati
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Yunhe Liu
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Linghua Wang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA
| | - Aaron M Newman
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub - San Francisco, San Francisco, CA, USA.
| |
Collapse
|
7
|
Wang S, Li H, Zhang K, Wu H, Pang S, Wu W, Ye L, Su J, Zhang Y. scSID: A lightweight algorithm for identifying rare cell types by capturing differential expression from single-cell sequencing data. Comput Struct Biotechnol J 2024; 23:589-600. [PMID: 38274993 PMCID: PMC10809081 DOI: 10.1016/j.csbj.2023.12.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/27/2023] [Accepted: 12/27/2023] [Indexed: 01/27/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is currently an important technology for identifying cell types and studying diseases at the genetic level. Identifying rare cell types is biologically important as one of the downstream data analyses of single-cell RNA sequencing. Although rare cell identification methods have been developed, most of these suffer from insufficient mining of intercellular similarities, low scalability, and being time-consuming. In this paper, we propose a single-cell similarity division algorithm (scSID) for identifying rare cells. It takes cell-to-cell similarity into consideration by analyzing both inter-cluster and intra-cluster similarities, and discovers rare cell types based on the similarity differences. We show that scSID outperforms other existing methods by benchmarking it on different experimental datasets. Application of scSID to multiple datasets, including 68K PBMC and intestine, highlights its exceptional scalability and remarkable ability to identify rare cell populations.
Collapse
Affiliation(s)
- Shudong Wang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Hengxiao Li
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Kuijie Zhang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Hao Wu
- College of Information Engineering, Northwest A&F University, 712100, Yangling, China
- School of Software, Shandong University, 250100, Jinan, China
| | - Shanchen Pang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Wenhao Wu
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Lan Ye
- Cancer Center, the Second Hospital of Shandong University, Jinan, 250033, China
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, 215123, Jiangsu, China
| | - Yulin Zhang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, 266590, China
| |
Collapse
|
8
|
Vo DHT, Thorne T. Shrinkage estimation of gene interaction networks in single-cell RNA sequencing data. BMC Bioinformatics 2024; 25:339. [PMID: 39462345 PMCID: PMC11515282 DOI: 10.1186/s12859-024-05946-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 09/23/2024] [Indexed: 10/29/2024] Open
Abstract
BACKGROUND Gene interaction networks are graphs in which nodes represent genes and edges represent functional interactions between them. These interactions can be at multiple levels, for instance, gene regulation, protein-protein interaction, or metabolic pathways. To analyse gene interaction networks at a large scale, gene co-expression network analysis is often applied on high-throughput gene expression data such as RNA sequencing data. With the advance in sequencing technology, expression of genes can be measured in individual cells. Single-cell RNA sequencing (scRNAseq) provides insights of cellular development, differentiation and characteristics at the transcriptomic level. High sparsity and high-dimensional data structures pose challenges in scRNAseq data analysis. RESULTS In this study, a sparse inverse covariance matrix estimation framework for scRNAseq data is developed to capture direct functional interactions between genes. Comparative analyses highlight high performance and fast computation of Stein-type shrinkage in high-dimensional data using simulated scRNAseq data. Data transformation approaches also show improvement in performance of shrinkage methods in non-Gaussian distributed data. Zero-inflated modelling of scRNAseq data based on a negative binomial distribution enhances shrinkage performance in zero-inflated data without interference on non zero-inflated count data. CONCLUSION The proposed framework broadens application of graphical model in scRNAseq analysis with flexibility in sparsity of count data resulting from dropout events, high performance, and fast computational time. Implementation of the framework is in a reproducible Snakemake workflow https://github.com/calathea24/ZINBGraphicalModel and R package ZINBStein https://github.com/calathea24/ZINBStein .
Collapse
Affiliation(s)
- Duong H T Vo
- Computer Science Research Centre, University of Surrey, Guildford, UK
| | - Thomas Thorne
- Computer Science Research Centre, University of Surrey, Guildford, UK.
| |
Collapse
|
9
|
Gui B, Wang Q, Wang J, Li X, Wu Q, Chen H. Cross-species comparison of airway epithelium transcriptomics. Heliyon 2024; 10:e38259. [PMID: 39391497 PMCID: PMC11466595 DOI: 10.1016/j.heliyon.2024.e38259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 09/19/2024] [Accepted: 09/20/2024] [Indexed: 10/12/2024] Open
Abstract
Studies of lung transcriptomics across species are essential for understanding the complex biology and disease mechanisms of this vital organ. Single-cell RNA sequencing (scRNA-seq) has emerged as a key tool for understanding cell dynamics across various species. However, comprehensive cross-species comparisons are limited. Therefore, the aims of this study was to investigate the transcriptomic similarities and differences in lung cells across four species-humans, monkeys, mice, and rats-in healthy and asthma conditions using scRNA-seq. The results revealed significant transcriptomic similarities between monkeys and humans and significant cross-species conservation of cell-specific marker genes, transcription factors (TFs), and biological pathways. Additionally, we explored sex differences, identifying distinct sex-specific expression patterns that may influence disease susceptibility. These insights refine our understanding of the mechanism underlying airway cell biology across species and have important implications for studying lung diseases, particularly the mechanisms of mucus clearance in asthma.
Collapse
Affiliation(s)
- Biyu Gui
- Department of Respiratory Medicine, Haihe Clinical School, Tianjin Medical University, Tianjin, 300350, China
- Department of Basic Medicine, Haihe Hospital, Tianjin University, Tianjin, 300350, China
| | - Qi Wang
- Department of Basic Medicine, Haihe Hospital, Tianjin University, Tianjin, 300350, China
- Department of Stomatology, Haihe Hospital, Tianjin University, Tianjin, 300350, China
| | - Jianhai Wang
- Department of Basic Medicine, Haihe Hospital, Tianjin University, Tianjin, 300350, China
- Tianjin Institute of Respiratory Diseases, 300350, Tianjin, China
| | - Xue Li
- Department of Basic Medicine, Haihe Hospital, Tianjin University, Tianjin, 300350, China
- Tianjin Institute of Respiratory Diseases, 300350, Tianjin, China
| | - Qi Wu
- Department of Respiratory Medicine, Haihe Clinical School, Tianjin Medical University, Tianjin, 300350, China
| | - Huaiyong Chen
- Department of Respiratory Medicine, Haihe Clinical School, Tianjin Medical University, Tianjin, 300350, China
- Department of Basic Medicine, Haihe Hospital, Tianjin University, Tianjin, 300350, China
- Tianjin Institute of Respiratory Diseases, 300350, Tianjin, China
- Tianjin Key Laboratory of Lung Regenerative Medicine, Haihe Hospital, Tianjin University, Tianjin, 300350, China
| |
Collapse
|
10
|
Silkwood K, Dollinger E, Gervin J, Atwood S, Nie Q, Lander AD. Leveraging gene correlations in single cell transcriptomic data. BMC Bioinformatics 2024; 25:305. [PMID: 39294560 PMCID: PMC11411778 DOI: 10.1186/s12859-024-05926-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 09/09/2024] [Indexed: 09/20/2024] Open
Abstract
BACKGROUND Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data-looking for rare cell types, subtleties of cell states, and details of gene regulatory networks-there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data in which ground truth about biological variation is unknown (i.e., usually). RESULTS We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization-a step that skews distributions, particularly for sparse data-and calculate p values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.
Collapse
Affiliation(s)
- Kai Silkwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Emmanuel Dollinger
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
| | - Joshua Gervin
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Scott Atwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Qing Nie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
| | - Arthur D Lander
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA.
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
11
|
Xu Y, Wang S, Feng Q, Xia J, Li Y, Li HD, Wang J. scCAD: Cluster decomposition-based anomaly detection for rare cell identification in single-cell expression data. Nat Commun 2024; 15:7561. [PMID: 39215003 PMCID: PMC11364754 DOI: 10.1038/s41467-024-51891-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 08/15/2024] [Indexed: 09/04/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for characterizing cellular landscapes within complex tissues. Large-scale single-cell transcriptomics holds great potential for identifying rare cell types critical to the pathogenesis of diseases and biological processes. Existing methods for identifying rare cell types often rely on one-time clustering using partial or global gene expression. However, these rare cell types may be overlooked during the clustering phase, posing challenges for their accurate identification. In this paper, we propose a Cluster decomposition-based Anomaly Detection method (scCAD), which iteratively decomposes clusters based on the most differential signals in each cluster to effectively separate rare cell types and achieve accurate identification. We benchmark scCAD on 25 real-world scRNA-seq datasets, demonstrating its superior performance compared to 10 state-of-the-art methods. In-depth case studies across diverse datasets, including mouse airway, brain, intestine, human pancreas, immunology data, and clear cell renal cell carcinoma, showcase scCAD's efficiency in identifying rare cell types in complex biological scenarios. Furthermore, scCAD can correct the annotation of rare cell types and identify immune cell subtypes associated with disease, thereby offering valuable insights into disease progression.
Collapse
Affiliation(s)
- Yunpei Xu
- School of Computer Science and Engineering, Central South University, Changsha, China
- Xiangjiang Laboratory, Changsha, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, China
| | - Shaokai Wang
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| | - Qilong Feng
- School of Computer Science and Engineering, Central South University, Changsha, China
- Xiangjiang Laboratory, Changsha, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, China
| | - Jiazhi Xia
- School of Computer Science and Engineering, Central South University, Changsha, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA, USA
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, China.
- Xiangjiang Laboratory, Changsha, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, China.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, China.
- Xiangjiang Laboratory, Changsha, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, China.
| |
Collapse
|
12
|
Wang S, Li H, Liu Y, Pang S, Qiao S, Su J, Wang S, Zhang Y. Connectivity Network Feature Sharing in Single-Cell RNA Sequencing Data Identifies Rare Cells. J Chem Inf Model 2024; 64:6596-6609. [PMID: 39096508 DOI: 10.1021/acs.jcim.4c00796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2024]
Abstract
Single-cell RNA sequencing is a valuable technique for identifying diverse cell subtypes. A key challenge in this process is that the detection of rare cells is often missed by conventional methods due to low abundance and subtle features of these cells. To overcome this, we developed SCLCNF (Local Connectivity Network Feature Sharing in Single-Cell RNA sequencing), a novel approach that identifies rare cells by analyzing features uniquely expressed in these cells. SCLCNF creates a cellular connectivity network, considering how each cell relates to its neighbors. This network helps to pinpoint coexpression patterns unique to rare cells, utilizing a rarity score to confirm their presence. Our method performs better in detecting rare cells than existing techniques, offering enhanced robustness. It has proven to be effective in human gastrula data sets for accurately pinpointing rare cells, and in sepsis data sets where it uncovers previously unidentified rare cell populations.
Collapse
Affiliation(s)
- Shudong Wang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Hengxiao Li
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Yahui Liu
- College of Science, China University of Petroleum (East China), Qingdao 266580, China
| | - Shanchen Pang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Sibo Qiao
- The College of Software, Tiangong University, Tianjin 300387, China
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou 215123, Jiangsu, China
| | - Shaoqiang Wang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China
| | - Yulin Zhang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao 266590, China
| |
Collapse
|
13
|
Li J, Shyr Y, Liu Q. aKNNO: single-cell and spatial transcriptomics clustering with an optimized adaptive k-nearest neighbor graph. Genome Biol 2024; 25:203. [PMID: 39090647 PMCID: PMC11293182 DOI: 10.1186/s13059-024-03339-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 07/16/2024] [Indexed: 08/04/2024] Open
Abstract
Typical clustering methods for single-cell and spatial transcriptomics struggle to identify rare cell types, while approaches tailored to detect rare cell types gain this ability at the cost of poorer performance for grouping abundant ones. Here, we develop aKNNO to simultaneously identify abundant and rare cell types based on an adaptive k-nearest neighbor graph with optimization. Benchmarking on 38 simulated and 20 single-cell and spatial transcriptomics datasets demonstrates that aKNNO identifies both abundant and rare cell types more accurately than general and specialized methods. Using only gene expression aKNNO maps abundant and rare cells more precisely compared to integrative approaches.
Collapse
Affiliation(s)
- Jia Li
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
| |
Collapse
|
14
|
Chen R, Nie P, Wang J, Wang GZ. Deciphering brain cellular and behavioral mechanisms: Insights from single-cell and spatial RNA sequencing. WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1865. [PMID: 38972934 DOI: 10.1002/wrna.1865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/05/2024] [Accepted: 05/14/2024] [Indexed: 07/09/2024]
Abstract
The brain is a complex computing system composed of a multitude of interacting neurons. The computational outputs of this system determine the behavior and perception of every individual. Each brain cell expresses thousands of genes that dictate the cell's function and physiological properties. Therefore, deciphering the molecular expression of each cell is of great significance for understanding its characteristics and role in brain function. Additionally, the positional information of each cell can provide crucial insights into their involvement in local brain circuits. In this review, we briefly overview the principles of single-cell RNA sequencing and spatial transcriptomics, the potential issues and challenges in their data processing, and their applications in brain research. We further outline several promising directions in neuroscience that could be integrated with single-cell RNA sequencing, including neurodevelopment, the identification of novel brain microstructures, cognition and behavior, neuronal cell positioning, molecules and cells related to advanced brain functions, sleep-wake cycles/circadian rhythms, and computational modeling of brain function. We believe that the deep integration of these directions with single-cell and spatial RNA sequencing can contribute significantly to understanding the roles of individual cells or cell types in these specific functions, thereby making important contributions to addressing critical questions in those fields. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA in Disease and Development > RNA in Development RNA in Disease and Development > RNA in Disease.
Collapse
Affiliation(s)
- Renrui Chen
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Pengxing Nie
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jing Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Guang-Zhong Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
15
|
Chen M. Beyond variability: a novel gene expression stability metric to unveil homeostasis and regulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596283. [PMID: 38854149 PMCID: PMC11160662 DOI: 10.1101/2024.05.28.596283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
The concept of gene expression stability within a homeostatic cell is explored through the gene homeostasis Z-index, a measure that highlights genes under active regulation in response to internal and external stimuli. This index reveals distinct regulatory activities and patterns in different organs, such as enhanced synaptic transmission in pancreatic islets. The research indicates that traditional mean-based methods may miss these nuances, underlining the significance of new metrics in identifying gene regulation specifics in cellular adaptation.
Collapse
|
16
|
Sang-aram C, Browaeys R, Seurinck R, Saeys Y. Spotless, a reproducible pipeline for benchmarking cell type deconvolution in spatial transcriptomics. eLife 2024; 12:RP88431. [PMID: 38787371 PMCID: PMC11126312 DOI: 10.7554/elife.88431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024] Open
Abstract
Spatial transcriptomics (ST) technologies allow the profiling of the transcriptome of cells while keeping their spatial context. Since most commercial untargeted ST technologies do not yet operate at single-cell resolution, computational methods such as deconvolution are often used to infer the cell type composition of each sequenced spot. We benchmarked 11 deconvolution methods using 63 silver standards, 3 gold standards, and 2 case studies on liver and melanoma tissues. We developed a simulation engine called synthspot to generate silver standards from single-cell RNA-sequencing data, while gold standards are generated by pooling single cells from targeted ST data. We evaluated methods based on their performance, stability across different reference datasets, and scalability. We found that cell2location and RCTD are the top-performing methods, but surprisingly, a simple regression model outperforms almost half of the dedicated spatial deconvolution methods. Furthermore, we observe that the performance of all methods significantly decreased in datasets with highly abundant or rare cell types. Our results are reproducible in a Nextflow pipeline, which also allows users to generate synthetic data, run deconvolution methods and optionally benchmark them on their dataset (https://github.com/saeyslab/spotless-benchmark).
Collapse
Affiliation(s)
- Chananchida Sang-aram
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation ResearchGhentBelgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent UniversityGhentBelgium
| | - Robin Browaeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation ResearchGhentBelgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent UniversityGhentBelgium
| | - Ruth Seurinck
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation ResearchGhentBelgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent UniversityGhentBelgium
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation ResearchGhentBelgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent UniversityGhentBelgium
| |
Collapse
|
17
|
Beigi YZ, Lanjanian H, Fayazi R, Salimi M, Hoseyni BHM, Noroozizadeh MH, Masoudi-Nejad A. Heterogeneity and molecular landscape of melanoma: implications for targeted therapy. MOLECULAR BIOMEDICINE 2024; 5:17. [PMID: 38724687 PMCID: PMC11082128 DOI: 10.1186/s43556-024-00182-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 04/08/2024] [Indexed: 05/12/2024] Open
Abstract
Uveal cancer (UM) offers a complex molecular landscape characterized by substantial heterogeneity, both on the genetic and epigenetic levels. This heterogeneity plays a critical position in shaping the behavior and response to therapy for this uncommon ocular malignancy. Targeted treatments with gene-specific therapeutic molecules may prove useful in overcoming radiation resistance, however, the diverse molecular makeups of UM call for a patient-specific approach in therapy procedures. We need to understand the intricate molecular landscape of UM to develop targeted treatments customized to each patient's specific genetic mutations. One of the promising approaches is using liquid biopsies, such as circulating tumor cells (CTCs) and circulating tumor DNA (ctDNA), for detecting and monitoring the disease at the early stages. These non-invasive methods can help us identify the most effective treatment strategies for each patient. Single-cellular is a brand-new analysis platform that gives treasured insights into diagnosis, prognosis, and remedy. The incorporation of this data with known clinical and genomics information will give a better understanding of the complicated molecular mechanisms that UM diseases exploit. In this review, we focused on the heterogeneity and molecular panorama of UM, and to achieve this goal, the authors conducted an exhaustive literature evaluation spanning 1998 to 2023, using keywords like "uveal melanoma, "heterogeneity". "Targeted therapies"," "CTCs," and "single-cellular analysis".
Collapse
Affiliation(s)
- Yasaman Zohrab Beigi
- Laboratory of System Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Hossein Lanjanian
- Software Engineering Department, Engineering Faculty, Istanbul Topkapi University, Istanbul, Turkey
| | - Reyhane Fayazi
- Laboratory of System Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Mahdieh Salimi
- Department of Medical Genetics, Institute of Medical Biotechnology, National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran, Iran
| | - Behnaz Haji Molla Hoseyni
- Laboratory of System Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | | | - Ali Masoudi-Nejad
- Laboratory of System Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
18
|
Gao Y, Dong K, Gao Y, Jin X, Yang J, Yan G, Liu Q. Unified cross-modality integration and analysis of T cell receptors and T cell transcriptomes by low-resource-aware representation learning. CELL GENOMICS 2024; 4:100553. [PMID: 38688285 PMCID: PMC11099349 DOI: 10.1016/j.xgen.2024.100553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/09/2024] [Accepted: 04/06/2024] [Indexed: 05/02/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) and T cell receptor sequencing (TCR-seq) are pivotal for investigating T cell heterogeneity. Integrating these modalities, which is expected to uncover profound insights in immunology that might otherwise go unnoticed with a single modality, faces computational challenges due to the low-resource characteristics of the multimodal data. Herein, we present UniTCR, a novel low-resource-aware multimodal representation learning framework designed for the unified cross-modality integration, enabling comprehensive T cell analysis. By designing a dual-modality contrastive learning module and a single-modality preservation module to effectively embed each modality into a common latent space, UniTCR demonstrates versatility in connecting TCR sequences with T cell transcriptomes across various tasks, including single-modality analysis, modality gap analysis, epitope-TCR binding prediction, and TCR profile cross-modality generation, in a low-resource-aware way. Extensive evaluations conducted on multiple scRNA-seq/TCR-seq paired datasets showed the superior performance of UniTCR, exhibiting the ability of exploring the complexity of immune system.
Collapse
Affiliation(s)
- Yicheng Gao
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Kejing Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Yuli Gao
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xuan Jin
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Jingya Yang
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Gang Yan
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China; Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China.
| |
Collapse
|
19
|
Bao S, Fan Y, Mei Y, Gao J. Integrating single-cell and bulk expression data to identify and analyze cancer prognosis-related genes. Heliyon 2024; 10:e25640. [PMID: 38379985 PMCID: PMC10877256 DOI: 10.1016/j.heliyon.2024.e25640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 01/03/2024] [Accepted: 01/31/2024] [Indexed: 02/22/2024] Open
Abstract
Compared with traditional evaluation methods of cancer prognosis based on tissue samples, single-cell sequencing technology can provide information on cell type heterogeneity for predicting biomarkers related to cancer prognosis. Therefore, the bulk and single-cell expression profiles of breast cancer and normal cells were comprehensively analyzed to identify malignant and non-malignant markers and construct a reliable prognosis model. We first screened highly reliable differentially expressed genes from bulk expression profiles of multiple breast cancer tissues and normal tissues, and inferred genes related to cell malignancy from single-cell data. Then we identified eight critical genes related to breast cancer to conduct Cox regression analysis, calculate polygenic risk score (PRS), and verify the predictive ability of PRS in two data groups. The results show that PRS can divide breast cancer patients into high-risk group and low-risk group. PRS is related to the overall survival time and relapse-free interval and is a prognosis factor independent of conventional clinicopathological characteristics. Breast cancer is usually regarded as a cancer with a relatively good prognosis. In order to further explore whether this workflow can be applied to cancer with poor prognosis, we selected lung cancer for a comparative study. The results show that this workflow can also build a reasonable prognosis model for lung cancer. This study provides new insight and practical source code for further research on cancer biomarkers and drug targets. It also provides basis for survival prediction, treatment response prediction, and personalized treatment.
Collapse
Affiliation(s)
- Shengbao Bao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yaxin Fan
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yichao Mei
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Junxiang Gao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
20
|
Zhang W, Huckaby B, Talburt J, Weissman S, Yang MQ. cnnImpute: missing value recovery for single cell RNA sequencing data. Sci Rep 2024; 14:3946. [PMID: 38365936 PMCID: PMC10873334 DOI: 10.1038/s41598-024-53998-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 02/07/2024] [Indexed: 02/18/2024] Open
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized our ability to explore cellular diversity and unravel the complexities of intricate diseases. However, due to the inherently low signal-to-noise ratio and the presence of an excessive number of missing values, scRNA-seq data analysis encounters unique challenges. Here, we present cnnImpute, a novel convolutional neural network (CNN) based method designed to address the issue of missing data in scRNA-seq. Our approach starts by estimating missing probabilities, followed by constructing a CNN-based model to recover expression values with a high likelihood of being missing. Through comprehensive evaluations, cnnImpute demonstrates its effectiveness in accurately imputing missing values while preserving the integrity of cell clusters in scRNA-seq data analysis. It achieved superior performance in various benchmarking experiments. cnnImpute offers an accurate and scalable method for recovering missing values, providing a useful resource for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Wenjuan Zhang
- MidSouth Bioinformatics Center and Joint Bioinformatics Graduate Program, University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Little Rock, 72204, AR, USA
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, 72204, AR, USA
| | - Brandon Huckaby
- Department of Computer Science, University of Arkansas at Little Rock, Little Rock, 72204, AR, USA
| | - John Talburt
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, 72204, AR, USA
| | - Sherman Weissman
- Department of Genetics, Yale School of Medicine, New Haven, 06520, CT, USA
| | - Mary Qu Yang
- MidSouth Bioinformatics Center and Joint Bioinformatics Graduate Program, University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Little Rock, 72204, AR, USA.
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, 72204, AR, USA.
| |
Collapse
|
21
|
Zhou S, Li Y, Wu W, Li L. scMMT: a multi-use deep learning approach for cell annotation, protein prediction and embedding in single-cell RNA-seq data. Brief Bioinform 2024; 25:bbad523. [PMID: 38300515 PMCID: PMC10833085 DOI: 10.1093/bib/bbad523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 12/19/2023] [Indexed: 02/02/2024] Open
Abstract
Accurate cell type annotation in single-cell RNA-sequencing data is essential for advancing biological and medical research, particularly in understanding disease progression and tumor microenvironments. However, existing methods are constrained by single feature extraction approaches, lack of adaptability to immune cell types with similar molecular profiles but distinct functions and a failure to account for the impact of cell label noise on model accuracy, all of which compromise the precision of annotation. To address these challenges, we developed a supervised approach called scMMT. We proposed a novel feature extraction technique to uncover more valuable information. Additionally, we constructed a multi-task learning framework based on the GradNorm method to enhance the recognition of challenging immune cells and reduce the impact of label noise by facilitating mutual reinforcement between cell type annotation and protein prediction tasks. Furthermore, we introduced logarithmic weighting and label smoothing mechanisms to enhance the recognition ability of rare cell types and prevent model overconfidence. Through comprehensive evaluations on multiple public datasets, scMMT has demonstrated state-of-the-art performance in various aspects including cell type annotation, rare cell identification, dropout and label noise resistance, protein expression prediction and low-dimensional embedding representation.
Collapse
Affiliation(s)
- Songqi Zhou
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Yang Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
- Chongqing Research Institute of Big Data, Peking University, Chongqing, China
| | - Wenyuan Wu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Li Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
22
|
Wang X, Duan M, Li J, Ma A, Xin G, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. Nat Commun 2024; 15:338. [PMID: 38184630 PMCID: PMC10771517 DOI: 10.1038/s41467-023-44570-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 12/14/2023] [Indexed: 01/08/2024] Open
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduce MarsGT: Multi-omics Analysis for Rare population inference using a Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperforms existing tools in identifying rare cells across 550 simulated and four real human datasets. In mouse retina data, it reveals unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detects an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identifies a rare MAIT-like population impacted by a high IFN-I response and reveals the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Gang Xin
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
23
|
Märtens K, Bortolomeazzi M, Montorsi L, Spencer J, Ciccarelli F, Yau C. Rarity: discovering rare cell populations from single-cell imaging data. Bioinformatics 2023; 39:btad750. [PMID: 38092048 PMCID: PMC10751233 DOI: 10.1093/bioinformatics/btad750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 11/24/2023] [Accepted: 12/11/2023] [Indexed: 12/28/2023] Open
Abstract
MOTIVATION Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori, while unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that is magnified when they are defined by differentially expressing a small number of genes. RESULTS Typical unsupervised approaches fail to identify such rare subpopulations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent, and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC datasets. AVAILABILITY AND IMPLEMENTATION Implementation of Rarity together with examples is available from the Github repository (https://github.com/kasparmartens/rarity).
Collapse
Affiliation(s)
- Kaspar Märtens
- The Alan Turing Institute, London NW1 2DB, United Kingdom
| | - Michele Bortolomeazzi
- Francis Crick Institute, London NW1 1AT, United Kingdom
- King’s College London, London WC2R 2LS, United Kingdom
| | - Lucia Montorsi
- Francis Crick Institute, London NW1 1AT, United Kingdom
- King’s College London, London WC2R 2LS, United Kingdom
| | - Jo Spencer
- King’s College London, London WC2R 2LS, United Kingdom
| | - Francesca Ciccarelli
- Francis Crick Institute, London NW1 1AT, United Kingdom
- Bart’s Cancer Institute - Centre for Cancer Genomics & Computational Biology, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, United Kingdom
| | - Christopher Yau
- The Alan Turing Institute, London NW1 2DB, United Kingdom
- Nuffield Department for Women’s & Reproductive Health, University of Oxford, Women’s Centre (Level 3), John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| |
Collapse
|
24
|
Gu Y, Hu Y, Zhang H, Wang S, Xu K, Su J. Single-cell RNA sequencing in osteoarthritis. Cell Prolif 2023; 56:e13517. [PMID: 37317049 PMCID: PMC10693192 DOI: 10.1111/cpr.13517] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/30/2023] [Accepted: 05/26/2023] [Indexed: 06/16/2023] Open
Abstract
Osteoarthritis is a progressive and heterogeneous joint disease with complex pathogenesis. The various phenotypes associated with each patient suggest that better subgrouping of tissues associated with genotypes in different phases of osteoarthritis may provide new insights into the onset and progression of the disease. Recently, single-cell RNA sequencing was used to describe osteoarthritis pathogenesis on a high-resolution view surpassing traditional technologies. Herein, this review summarizes the microstructural changes in articular cartilage, meniscus, synovium and subchondral bone that are mainly due to crosstalk amongst chondrocytes, osteoblasts, fibroblasts and endothelial cells during osteoarthritis progression. Next, we focus on the promising targets discovered by single-cell RNA sequencing and its potential applications in target drugs and tissue engineering. Additionally, the limited amount of research on the evaluation of bone-related biomaterials is reviewed. Based on the pre-clinical findings, we elaborate on the potential clinical values of single-cell RNA sequencing for the therapeutic strategies of osteoarthritis. Finally, a perspective on the future development of patient-centred medicine for osteoarthritis therapy combining other single-cell multi-omics technologies is discussed. This review will provide new insights into osteoarthritis pathogenesis on a cellular level and the field of applications of single-cell RNA sequencing in personalized therapeutics for osteoarthritis in the future.
Collapse
Affiliation(s)
- Yuyuan Gu
- Institute of Translational MedicineShanghai UniversityShanghaiChina
- Organoid Research CenterShanghai UniversityShanghaiChina
- School of MedicineShanghai UniversityShanghaiChina
| | - Yan Hu
- Institute of Translational MedicineShanghai UniversityShanghaiChina
- Organoid Research CenterShanghai UniversityShanghaiChina
| | - Hao Zhang
- Institute of Translational MedicineShanghai UniversityShanghaiChina
- Organoid Research CenterShanghai UniversityShanghaiChina
| | - Sicheng Wang
- Institute of Translational MedicineShanghai UniversityShanghaiChina
- Organoid Research CenterShanghai UniversityShanghaiChina
- Department of OrthopedicsShanghai Zhongye HospitalShanghaiChina
| | - Ke Xu
- Institute of Translational MedicineShanghai UniversityShanghaiChina
- Organoid Research CenterShanghai UniversityShanghaiChina
- Wenzhou Institute of Shanghai UniversityWenzhouChina
| | - Jiacan Su
- Institute of Translational MedicineShanghai UniversityShanghaiChina
- Organoid Research CenterShanghai UniversityShanghaiChina
| |
Collapse
|
25
|
Dezem FS, Marção M, Ben-Cheikh B, Nikulina N, Omotoso A, Burnett D, Coelho P, Hurley J, Gomez C, Phan-Everson T, Ong G, Martelotto L, Lewis ZR, George S, Braubach O, Malta TM, Plummer J. A machine learning one-class logistic regression model to predict stemness for single cell transcriptomics and spatial omics. BMC Genomics 2023; 24:717. [PMID: 38017371 PMCID: PMC10683105 DOI: 10.1186/s12864-023-09722-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/07/2023] [Indexed: 11/30/2023] Open
Abstract
Cell annotation is a crucial methodological component to interpreting single cell and spatial omics data. These approaches were developed for single cell analysis but are often biased, manually curated and yet unproven in spatial omics. Here we apply a stemness model for assessing oncogenic states to single cell and spatial omic cancer datasets. This one-class logistic regression machine learning algorithm is used to extract transcriptomic features from non-transformed stem cells to identify dedifferentiated cell states in tumors. We found this method identifies single cell states in metastatic tumor cell populations without the requirement of cell annotation. This machine learning model identified stem-like cell populations not identified in single cell or spatial transcriptomic analysis using existing methods. For the first time, we demonstrate the application of a ML tool across five emerging spatial transcriptomic and proteomic technologies to identify oncogenic stem-like cell types in the tumor microenvironment.
Collapse
Affiliation(s)
- Felipe Segato Dezem
- Center for Spatial Omics, St Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN, USA
- Department of Clinical Analysis, Toxicology and Food Sciences, School of Pharmaceutical Sciences of Ribeirao Preto, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Maycon Marção
- Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN, USA
- Department of Clinical Analysis, Toxicology and Food Sciences, School of Pharmaceutical Sciences of Ribeirao Preto, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Bassem Ben-Cheikh
- Akoya Biosciences, The Spatial Biology Company, Marlborough, MA, USA
| | - Nadya Nikulina
- Akoya Biosciences, The Spatial Biology Company, Marlborough, MA, USA
| | - Ayodele Omotoso
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Destiny Burnett
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Priscila Coelho
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Judith Hurley
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Carmen Gomez
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | | | - Giang Ong
- Nanostring Technologies, Seattle, WA, USA
| | | | | | - Sophia George
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Oliver Braubach
- Akoya Biosciences, The Spatial Biology Company, Marlborough, MA, USA
| | - Tathiane M Malta
- Department of Clinical Analysis, Toxicology and Food Sciences, School of Pharmaceutical Sciences of Ribeirao Preto, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Jasmine Plummer
- Center for Spatial Omics, St Jude Children's Research Hospital, Memphis, TN, USA.
- Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN, USA.
- Department of Cellular & Molecular Biology, St Jude Children's Research Hospital, Memphis, TN, USA.
- Comprehensive Cancer Center, St Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
26
|
Liu J, Zeng W, Kan S, Li M, Zheng R. CAKE: a flexible self-supervised framework for enhancing cell visualization, clustering and rare cell identification. Brief Bioinform 2023; 25:bbad475. [PMID: 38145950 PMCID: PMC10749894 DOI: 10.1093/bib/bbad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/13/2023] [Accepted: 11/30/2023] [Indexed: 12/27/2023] Open
Abstract
Single cell sequencing technology has provided unprecedented opportunities for comprehensively deciphering cell heterogeneity. Nevertheless, the high dimensionality and intricate nature of cell heterogeneity have presented substantial challenges to computational methods. Numerous novel clustering methods have been proposed to address this issue. However, none of these methods achieve the consistently better performance under different biological scenarios. In this study, we developed CAKE, a novel and scalable self-supervised clustering method, which consists of a contrastive learning model with a mixture neighborhood augmentation for cell representation learning, and a self-Knowledge Distiller model for the refinement of clustering results. These designs provide more condensed and cluster-friendly cell representations and improve the clustering performance in term of accuracy and robustness. Furthermore, in addition to accurately identifying the major type cells, CAKE could also find more biologically meaningful cell subgroups and rare cell types. The comprehensive experiments on real single-cell RNA sequencing datasets demonstrated the superiority of CAKE in visualization and clustering over other comparison methods, and indicated its extensive application in the field of cell heterogeneity analysis. Contact: Ruiqing Zheng. (rqzheng@csu.edu.cn).
Collapse
Affiliation(s)
- Jin Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Weixing Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Shichao Kan
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
27
|
HELLER GERWIN, FUEREDER THORSTEN, GRANDITS ALEXANDERMICHAEL, WIESER ROTRAUD. New perspectives on biology, disease progression, and therapy response of head and neck cancer gained from single cell RNA sequencing and spatial transcriptomics. Oncol Res 2023; 32:1-17. [PMID: 38188682 PMCID: PMC10767240 DOI: 10.32604/or.2023.044774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 10/12/2023] [Indexed: 01/09/2024] Open
Abstract
Head and neck squamous cell carcinoma (HNSCC) is one of the most frequent cancers worldwide. The main risk factors are consumption of tobacco products and alcohol, as well as infection with human papilloma virus. Approved therapeutic options comprise surgery, radiation, chemotherapy, targeted therapy through epidermal growth factor receptor inhibition, and immunotherapy, but outcome has remained unsatisfactory due to recurrence rates of ~50% and the frequent occurrence of second primaries. The availability of the human genome sequence at the beginning of the millennium heralded the omics era, in which rapid technological progress has advanced our knowledge of the molecular biology of malignant diseases, including HNSCC, at an unprecedented pace. Initially, microarray-based methods, followed by approaches based on next-generation sequencing, were applied to study the genetics, epigenetics, and gene expression patterns of bulk tumors. More recently, the advent of single-cell RNA sequencing (scRNAseq) and spatial transcriptomics methods has facilitated the investigation of the heterogeneity between and within different cell populations in the tumor microenvironment (e.g., cancer cells, fibroblasts, immune cells, endothelial cells), led to the discovery of novel cell types, and advanced the discovery of cell-cell communication within tumors. This review provides an overview of scRNAseq, spatial transcriptomics, and the associated bioinformatics methods, and summarizes how their application has promoted our understanding of the emergence, composition, progression, and therapy responsiveness of, and intercellular signaling within, HNSCC.
Collapse
Affiliation(s)
- GERWIN HELLER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | - THORSTEN FUEREDER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | | | - ROTRAUD WIESER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
- Ludwig Boltzmann Institute for Hematology and Oncology, Medical University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
28
|
Silkwood K, Dollinger E, Gervin J, Atwood S, Nie Q, Lander AD. Leveraging gene correlations in single cell transcriptomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532643. [PMID: 36993765 PMCID: PMC10055147 DOI: 10.1101/2023.03.14.532643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
BACKGROUND Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data-looking for rare cell types, subtleties of cell states, and details of gene regulatory networks-there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data when ground truth about biological variation is unknown (i.e., usually). RESULTS We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization-a step that skews distributions, particularly for sparse data-and calculate p-values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.
Collapse
Affiliation(s)
- Kai Silkwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Emmanuel Dollinger
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
- Department of Mathematics, University of California, Irvine, Irvine CA
| | - Josh Gervin
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Scott Atwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Qing Nie
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
- Department of Mathematics, University of California, Irvine, Irvine CA
| | - Arthur D. Lander
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| |
Collapse
|
29
|
Li J, Shyr Y, Liu Q. Single-cell and Spatial Transcriptomics Clustering with an Optimized Adaptive K-Nearest Neighbor Graph. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.13.562261. [PMID: 37905097 PMCID: PMC10614787 DOI: 10.1101/2023.10.13.562261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Single-cell and spatial transcriptomics have been widely used to characterize cellular landscape in complex tissues. To understand cellular heterogeneity, one essential step is to define cell types through unsupervised clustering. While typical clustering methods have difficulty in identifying rare cell types, approaches specifically tailored to detect rare cell types gain their ability at the cost of poorer performance for grouping abundant ones. Here, we developed aKNNO, a method to identify abundant and rare cell types simultaneously based on an adaptive k-nearest neighbor graph with optimization. Benchmarked on 38 simulated and 20 single-cell and spatial transcriptomics datasets, aKNNO identified both abundant and rare cell types accurately. Without sacrificing performance for clustering abundant cell types, aKNNO discovered known and novel rare cell types that those typical and even specifically tailored methods failed to detect. aKNNO, using transcriptome alone, stereotyped fine-grained anatomical structures more precisely than those integrative approaches combining expression with spatial locations and histology image.
Collapse
Affiliation(s)
- Jia Li
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| |
Collapse
|
30
|
Pandian K, Matsui M, Hankemeier T, Ali A, Okubo-Kurihara E. Advances in single-cell metabolomics to unravel cellular heterogeneity in plant biology. PLANT PHYSIOLOGY 2023; 193:949-965. [PMID: 37338502 PMCID: PMC10517197 DOI: 10.1093/plphys/kiad357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/03/2023] [Accepted: 05/17/2023] [Indexed: 06/21/2023]
Abstract
Single-cell metabolomics is a powerful tool that can reveal cellular heterogeneity and can elucidate the mechanisms of biological phenomena in detail. It is a promising approach in studying plants, especially when cellular heterogeneity has an impact on different biological processes. In addition, metabolomics, which can be regarded as a detailed phenotypic analysis, is expected to answer previously unrequited questions which will lead to expansion of crop production, increased understanding of resistance to diseases, and in other applications as well. In this review, we will introduce the flow of sample acquisition and single-cell techniques to facilitate the adoption of single-cell metabolomics. Furthermore, the applications of single-cell metabolomics will be summarized and reviewed.
Collapse
Affiliation(s)
- Kanchana Pandian
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, Einstein Road 55, 2333 CC Leiden, The Netherlands
| | - Minami Matsui
- RIKEN, Center for Sustainable Resource Science, Kanagawa 230-0045, Japan
| | - Thomas Hankemeier
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, Einstein Road 55, 2333 CC Leiden, The Netherlands
| | - Ahmed Ali
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, Einstein Road 55, 2333 CC Leiden, The Netherlands
| | - Emiko Okubo-Kurihara
- RIKEN, Center for Sustainable Resource Science, Kanagawa 230-0045, Japan
- College of Science, Rikkyo University, Tokyo 171-8501, Japan
| |
Collapse
|
31
|
Yang Y, Lin YT, Li G, Zhong Y, Xu Q, Cai JJ. Interpretable modeling of time-resolved single-cell gene-protein expression with CrossmodalNet. Brief Bioinform 2023; 24:bbad342. [PMID: 37798250 DOI: 10.1093/bib/bbad342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 08/15/2023] [Accepted: 09/07/2023] [Indexed: 10/07/2023] Open
Abstract
Cell-surface proteins play a critical role in cell function and are primary targets for therapeutics. CITE-seq is a single-cell technique that enables simultaneous measurement of gene and surface protein expression. It is powerful but costly and technically challenging. Computational methods have been developed to predict surface protein expression using gene expression information such as from single-cell RNA sequencing (scRNA-seq) data. Existing methods however are computationally demanding and lack the interpretability to reveal underlying biological processes. We propose CrossmodalNet, an interpretable machine learning model, to predict surface protein expression from scRNA-seq data. Our model with a customized adaptive loss accurately predicts surface protein abundances. When samples from multiple time points are given, our model encodes temporal information into an easy-to-interpret time embedding to make prediction in a time-point-specific manner, and is able to uncover noise-free causal gene-protein relationships. Using three publicly available time-resolved CITE-seq data sets, we validate the performance of our model by comparing it with benchmarking methods and evaluate its interpretability. Together, we show that our method accurately and interpretably profiles surface protein expression using scRNA-seq data, thereby expanding the capacity of CITE-seq experiments for investigating molecular mechanisms involving surface proteins.
Collapse
Affiliation(s)
- Yongjian Yang
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Yu-Te Lin
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Guanxun Li
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Yan Zhong
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Qian Xu
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - James J Cai
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| |
Collapse
|
32
|
Lei T, Chen R, Zhang S, Chen Y. Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations. Brief Bioinform 2023; 24:bbad335. [PMID: 37769630 PMCID: PMC10539043 DOI: 10.1093/bib/bbad335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 10/02/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a widely used technique for characterizing individual cells and studying gene expression at the single-cell level. Clustering plays a vital role in grouping similar cells together for various downstream analyses. However, the high sparsity and dimensionality of large scRNA-seq data pose challenges to clustering performance. Although several deep learning-based clustering algorithms have been proposed, most existing clustering methods have limitations in capturing the precise distribution types of the data or fully utilizing the relationships between cells, leaving a considerable scope for improving the clustering performance, particularly in detecting rare cell populations from large scRNA-seq data. We introduce DeepScena, a novel single-cell hierarchical clustering tool that fully incorporates nonlinear dimension reduction, negative binomial-based convolutional autoencoder for data fitting, and a self-supervision model for cell similarity enhancement. In comprehensive evaluation using multiple large-scale scRNA-seq datasets, DeepScena consistently outperformed seven popular clustering tools in terms of accuracy. Notably, DeepScena exhibits high proficiency in identifying rare cell populations within large datasets that contain large numbers of clusters. When applied to scRNA-seq data of multiple myeloma cells, DeepScena successfully identified not only previously labeled large cell types but also subpopulations in CD14 monocytes, T cells and natural killer cells, respectively.
Collapse
Affiliation(s)
- Tianyuan Lei
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Ruoyu Chen
- Moorestown High School, Moorestown, NJ 08057, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, NJ 08028, USA
| |
Collapse
|
33
|
Chen K, Wang Z. A Micropillar Array Based Microfluidic Device for Rare Cell Detection and Single-Cell Proteomics. Methods Protoc 2023; 6:80. [PMID: 37736963 PMCID: PMC10514859 DOI: 10.3390/mps6050080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 08/25/2023] [Accepted: 08/31/2023] [Indexed: 09/23/2023] Open
Abstract
Advancements in single-cell-related technologies have opened new possibilities for analyzing rare cells, such as circulating tumor cells (CTCs) and rare immune cells. Among these techniques, single-cell proteomics, particularly single-cell mass spectrometric analysis (scMS), has gained significant attention due to its ability to directly measure transcripts without the need for specific reagents. However, the success of single-cell proteomics relies heavily on efficient sample preparation, as protein loss in low-concentration samples can profoundly impact the analysis. To address this challenge, an effective handling system for rare cells is essential for single-cell proteomic analysis. Herein, we propose a microfluidics-based method that offers highly efficient isolation, detection, and collection of rare cells (e.g., CTCs). The detailed fabrication process of the micropillar array-based microfluidic device is presented, along with its application for CTC isolation, identification, and collection for subsequent proteomic analysis.
Collapse
Affiliation(s)
- Kangfu Chen
- Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60208, USA;
- Chan Zuckerberg Biohub Chicago, Chicago, IL 60607, USA
| | - Zongjie Wang
- Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60208, USA;
- Chan Zuckerberg Biohub Chicago, Chicago, IL 60607, USA
| |
Collapse
|
34
|
DeMeo B, Berger B. SCA: recovering single-cell heterogeneity through information-based dimensionality reduction. Genome Biol 2023; 24:195. [PMID: 37626411 PMCID: PMC10464206 DOI: 10.1186/s13059-023-02998-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 06/28/2023] [Indexed: 08/27/2023] Open
Abstract
Dimensionality reduction summarizes the complex transcriptomic landscape of single-cell datasets for downstream analyses. Current approaches favor large cellular populations defined by many genes, at the expense of smaller and more subtly defined populations. Here, we present surprisal component analysis (SCA), a technique that newly leverages the information-theoretic notion of surprisal for dimensionality reduction to promote more meaningful signal extraction. For example, SCA uncovers clinically important cytotoxic T-cell subpopulations that are indistinguishable using existing pipelines. We also demonstrate that SCA substantially improves downstream imputation. SCA's efficient information-theoretic paradigm has broad applications to the study of complex biological tissues in health and disease.
Collapse
Affiliation(s)
- Benjamin DeMeo
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, 02139, MA, USA
- Department of Biomedical Informatics, Harvard University, Cambridge, 02138, MA, USA
- Department of Mathematics, MIT, Cambridge, 02139, MA, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, 02139, MA, USA.
- Department of Mathematics, MIT, Cambridge, 02139, MA, USA.
| |
Collapse
|
35
|
Wang X, Duan M, Li J, Ma A, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553454. [PMID: 37645917 PMCID: PMC10462017 DOI: 10.1101/2023.08.15.553454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduced MarsGT: Multi-omics Analysis for Rare population inference using Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperformed existing tools in identifying rare cells across 400 simulated and four real human datasets. In mouse retina data, it revealed unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detected an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identified a rare MAIT-like population impacted by a high IFN-I response and revealed the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
36
|
Chen S, Zheng R, Tian L, Wu FX, Li M. A posterior probability based Bayesian method for single-cell RNA-seq data imputation. Methods 2023; 216:21-38. [PMID: 37315825 DOI: 10.1016/j.ymeth.2023.06.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/19/2023] [Accepted: 06/07/2023] [Indexed: 06/16/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) data suffer from a lot of zeros. Such dropout events impede the downstream data analyses. We propose BayesImpute to infer and impute dropouts from the scRNA-seq data. Using the expression rate and coefficient of variation of the genes within the cell subpopulation, BayesImpute first determines likely dropouts, and then constructs the posterior distribution for each gene and uses the posterior mean to impute dropout values. Some simulated and real experiments show that BayesImpute can effectively identify dropout events and reduce the introduction of false positive signals. Additionally, BayesImpute successfully recovers the true expression levels of missing values, restores the gene-to-gene and cell-to-cell correlation coefficient, and maintains the biological information in bulk RNA-seq data. Furthermore, BayesImpute boosts the clustering and visualization of cell subpopulations and improves the identification of differentially expressed genes. We further demonstrate that, in comparison to other statistical-based imputation methods, BayesImpute is scalable and fast with minimal memory usage.
Collapse
Affiliation(s)
- Siqi Chen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Luyi Tian
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
37
|
Tosoni G, Ayyildiz D, Bryois J, Macnair W, Fitzsimons CP, Lucassen PJ, Salta E. Mapping human adult hippocampal neurogenesis with single-cell transcriptomics: Reconciling controversy or fueling the debate? Neuron 2023; 111:1714-1731.e3. [PMID: 37015226 DOI: 10.1016/j.neuron.2023.03.010] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 02/06/2023] [Accepted: 03/08/2023] [Indexed: 04/05/2023]
Abstract
The notion of exploiting the regenerative potential of the human brain in physiological aging or neurological diseases represents a particularly attractive alternative to conventional strategies for enhancing or restoring brain function. However, a major first question to address is whether the human brain does possess the ability to regenerate. The existence of human adult hippocampal neurogenesis (AHN) has been at the center of a fierce scientific debate for many years. The advent of single-cell transcriptomic technologies was initially viewed as a panacea to resolving this controversy. However, recent single-cell RNA sequencing studies in the human hippocampus yielded conflicting results. Here, we critically discuss and re-analyze previously published AHN-related single-cell transcriptomic datasets. We argue that, although promising, the single-cell transcriptomic profiling of AHN in the human brain can be confounded by methodological, conceptual, and biological factors that need to be consistently addressed across studies and openly discussed within the scientific community.
Collapse
Affiliation(s)
- Giorgia Tosoni
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands
| | - Dilara Ayyildiz
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands
| | - Julien Bryois
- Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center, CH-4070, Basel, Switzerland
| | - Will Macnair
- Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center, CH-4070, Basel, Switzerland
| | - Carlos P Fitzsimons
- Brain Plasticity group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, 1098 XH, Amsterdam, the Netherlands
| | - Paul J Lucassen
- Brain Plasticity group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, 1098 XH, Amsterdam, the Netherlands; Center for Urban Mental Health, University of Amsterdam, 1098 SM, Amsterdam, the Netherlands
| | - Evgenia Salta
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands.
| |
Collapse
|
38
|
Lubatti G, Stock M, Iturbide A, Ruiz Tejada Segura ML, Riepl M, Tyser RCV, Danese A, Colomé-Tatché M, Theis FJ, Srinivas S, Torres-Padilla ME, Scialdone A. CIARA: a cluster-independent algorithm for identifying markers of rare cell types from single-cell sequencing data. Development 2023; 150:dev201264. [PMID: 37294170 DOI: 10.1242/dev.201264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/25/2023] [Indexed: 05/18/2023]
Abstract
A powerful feature of single-cell genomics is the possibility of identifying cell types from their molecular profiles. In particular, identifying novel rare cell types and their marker genes is a key potential of single-cell RNA sequencing. Standard clustering approaches perform well in identifying relatively abundant cell types, but tend to miss rarer cell types. Here, we have developed CIARA (Cluster Independent Algorithm for the identification of markers of RAre cell types), a cluster-independent computational tool designed to select genes that are likely to be markers of rare cell types. Genes selected by CIARA are subsequently integrated with common clustering algorithms to single out groups of rare cell types. CIARA outperforms existing methods for rare cell type detection, and we use it to find previously uncharacterized rare populations of cells in a human gastrula and among mouse embryonic stem cells treated with retinoic acid. Moreover, CIARA can be applied more generally to any type of single-cell omic data, thus allowing the identification of rare cells across multiple data modalities. We provide implementations of CIARA in user-friendly packages available in R and Python.
Collapse
Affiliation(s)
- Gabriele Lubatti
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Marco Stock
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, D-85354 Freising, Germany
| | - Ane Iturbide
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
| | - Mayra L Ruiz Tejada Segura
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Melina Riepl
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Richard C V Tyser
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK
| | - Anna Danese
- Biomedical Center Munich (BMC), Physiological Genomics, Faculty of Medicine, Ludwig Maximilians University, D-82152 Munich, Germany
| | - Maria Colomé-Tatché
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- Biomedical Center (BMC), Physiological Chemistry, Faculty of Medicine, Ludwig Maximilians University, D-82152 Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- Department of Mathematics, Technical University of Munich, D-85748 Munich, Germany
| | - Shankar Srinivas
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, UK
| | - Maria-Elena Torres-Padilla
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Faculty of Biology, Ludwig-Maximilians University, D-82152 Munich, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| |
Collapse
|
39
|
Lee S, Vu HM, Lee JH, Lim H, Kim MS. Advances in Mass Spectrometry-Based Single Cell Analysis. BIOLOGY 2023; 12:395. [PMID: 36979087 PMCID: PMC10045136 DOI: 10.3390/biology12030395] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 02/27/2023] [Accepted: 03/01/2023] [Indexed: 03/06/2023]
Abstract
Technological developments and improvements in single-cell isolation and analytical platforms allow for advanced molecular profiling at the single-cell level, which reveals cell-to-cell variation within the admixture cells in complex biological or clinical systems. This helps to understand the cellular heterogeneity of normal or diseased tissues and organs. However, most studies focused on the analysis of nucleic acids (e.g., DNA and RNA) and mass spectrometry (MS)-based analysis for proteins and metabolites of a single cell lagged until recently. Undoubtedly, MS-based single-cell analysis will provide a deeper insight into cellular mechanisms related to health and disease. This review summarizes recent advances in MS-based single-cell analysis methods and their applications in biology and medicine.
Collapse
Affiliation(s)
- Siheun Lee
- School of Undergraduate Studies, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu 42988, Republic of Korea
| | - Hung M. Vu
- Department of New Biology, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu 42988, Republic of Korea
| | - Jung-Hyun Lee
- Department of New Biology, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu 42988, Republic of Korea
| | - Heejin Lim
- Center for Scientific Instrumentation, Korea Basic Science Institute (KBSI), Cheongju 28119, Republic of Korea
| | - Min-Sik Kim
- Department of New Biology, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu 42988, Republic of Korea
- New Biology Research Center, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu 42988, Republic of Korea
- Center for Cell Fate Reprogramming and Control, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu 42988, Republic of Korea
| |
Collapse
|
40
|
Cedillo-Alcantar DF, Rodriguez-Moncayo R, Maravillas-Montero JL, Garcia-Cordero JL. On-Chip Analysis of Protein Secretion from Single Cells Using Microbead Biosensors. ACS Sens 2023; 8:655-664. [PMID: 36710459 DOI: 10.1021/acssensors.2c02148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The profiling of the effector functions of single immune cells─including cytokine secretion─can lead to a deeper understanding of how the immune system operates and to potential diagnostics and therapeutical applications. Here, we report a microfluidic device that pairs single cells and antibody-functionalized microbeads in hydrodynamic traps to quantitate cytokine secretion. The device contains 1008 microchambers, each with a volume of ∼500 pL, divided into six different sections individually addressed to deliver an equal number of chemical stimuli. Integrating microvalves allowed us to isolate cell/bead pairs, preventing cross-contamination with factors secreted by adjacent cells. We implemented a fluorescence sandwich immunoassay on the biosensing microbeads with a limit of detection of 9 pg/mL and were able to detect interleukin-8 (IL-8) secreted by single blood-derived human monocytes in response to different concentrations of LPS. Finally, our platform allowed us to observe a significant decrease in the number of IL-8-secreting monocytes when paracrine signaling becomes disrupted. Overall, our platform could have a variety of applications for which the analysis of cellular function heterogeneity is necessary, such as cancer research, antibody discovery, or rare cell screening.
Collapse
Affiliation(s)
- Diana F Cedillo-Alcantar
- Laboratory of Microtechnologies for Biomedicine, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Monterrey 66628, Nuevo León Mexico
| | - Roberto Rodriguez-Moncayo
- Laboratory of Microtechnologies for Biomedicine, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Monterrey 66628, Nuevo León Mexico
| | - Jose L Maravillas-Montero
- Red de Apoyo a la Investigación, Universidad Nacional Autónoma de México e Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City 14080, Mexico
| | - Jose L Garcia-Cordero
- Laboratory of Microtechnologies for Biomedicine, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Monterrey 66628, Nuevo León Mexico.,Roche Institute for Translational Bioengineering (ITB), Roche Pharma Research and Early Development, Roche Innovation Center Basel, Basel 4058, Switzerland
| |
Collapse
|
41
|
Watson ER, Mora A, Taherian Fard A, Mar JC. How does the structure of data impact cell-cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data. Brief Bioinform 2022; 23:bbac387. [PMID: 36151725 PMCID: PMC9677483 DOI: 10.1093/bib/bbac387] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/26/2022] [Accepted: 08/11/2022] [Indexed: 12/14/2022] Open
Abstract
Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the 'best-performing' metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.
Collapse
Affiliation(s)
- Ebony Rose Watson
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Ariane Mora
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
42
|
Rossi T, Angeli D, Martinelli G, Fabbri F, Gallerani G. From phenotypical investigation to RNA-sequencing for gene expression analysis: A workflow for single and pooled rare cells. Front Genet 2022; 13:1012191. [PMID: 36452152 PMCID: PMC9703136 DOI: 10.3389/fgene.2022.1012191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 10/28/2022] [Indexed: 08/30/2023] Open
Abstract
Combining phenotypical and molecular characterization of rare cells is challenging due to their scarcity and difficult handling. In oncology, circulating tumor cells (CTCs) are considered among the most important rare cell populations. Their phenotypic and molecular characterization is necessary to define the molecular mechanisms underlying their metastatic potential. Several approaches that require cell fixation make difficult downstream molecular investigations on RNA. Conversely, the DEPArray technology allows phenotypic analysis and handling of both fixed and unfixed cells, enabling a wider range of applications. Here, we describe an experimental workflow that allows the transcriptomic investigation of single and pooled OE33 cells undergone to DEPArray analysis and recovery. In addition, cells were tested at different conditions (unfixed, CellSearch fixative (CSF)- and ethanol (EtOH)-fixed cells). In a forward-looking perspective, this workflow will pave the way for novel strategies to characterize gene expression profiles of rare cells, both single-cell and low-resolution input.
Collapse
Affiliation(s)
- Tania Rossi
- Biosciences Laboratory, IRCCS Istituto Romagnolo per lo Studio dei Tumori (IRST) “Dino Amadori”, Meldola, Italy
| | - Davide Angeli
- Unit of Biostatistics and Clinical Trials, IRCCS Istituto Romagnolo per lo Studio dei Tumori (IRST) “Dino Amadori”, Meldola, Italy
| | - Giovanni Martinelli
- Scientific Directorate, IRCCS Istituto Romagnolo per lo Studio dei Tumori (IRST) “Dino Amadori”, Meldola, Italy
| | - Francesco Fabbri
- Biosciences Laboratory, IRCCS Istituto Romagnolo per lo Studio dei Tumori (IRST) “Dino Amadori”, Meldola, Italy
| | - Giulia Gallerani
- Biosciences Laboratory, IRCCS Istituto Romagnolo per lo Studio dei Tumori (IRST) “Dino Amadori”, Meldola, Italy
| |
Collapse
|
43
|
Yidian C, Chen L, Hongxia D, Yanguo L, Zhisen S. Single-cell sequencing reveals the cell map and transcriptional network of sporadic vestibular schwannoma. Front Mol Neurosci 2022; 15:984529. [PMID: 36304995 PMCID: PMC9592810 DOI: 10.3389/fnmol.2022.984529] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 09/01/2022] [Indexed: 11/13/2022] Open
Abstract
In this study, based on three tumor samples obtained from patients with sporadic vestibular schwannoma, 32,011 cells were obtained by single-cell transcriptome sequencing, and 22,309 high-quality cells were obtained after quality control and double cells removal. Then, 18 cell clusters were obtained after cluster analysis, and each cluster was annotated as six types of cells. Afterward, an in-depth analysis was conducted based on the defined six cell clusters, including characterizing the functional characteristics of each cell subtype, describing the cell development and differentiation pathway, exploring the interaction between cells, and analyzing the transcriptional regulatory network within the clusters. Based on these four dimensions, various types of cells in sporadic vestibular schwannoma tumor tissues were described in detail. For the first time, we expanded on the functional state of cell clusters that have been reported and described Schwann cells in the peripheral nervous system, which have not been reported in previous studies. Combined with the data of sporadic vestibular schwannoma and normal tissues in the gene expression omnibus (GEO) database, the candidate biomarkers of sporadic vestibular schwannoma were explored. Overall, this study described the single-cell map of sporadic vestibular schwannoma for the first time, revealing the functional state and development trajectory of different cell types. Combined with the analysis of data in the GEO database and immunohistochemical verification, it was concluded that HLA-DPB1 and VSIG4 may be candidate biomarkers and potential therapeutic targets for patients with sporadic vestibular schwannoma.
Collapse
Affiliation(s)
- Chu Yidian
- The Affiliated Lihuili Hospital, Ningbo University, Ningbo, China
- School of Medicine, Ningbo University, Ningbo, China
| | - Lin Chen
- The Affiliated Lihuili Hospital, Ningbo University, Ningbo, China
- School of Medicine, Ningbo University, Ningbo, China
| | - Deng Hongxia
- The Affiliated Lihuili Hospital, Ningbo University, Ningbo, China
- School of Medicine, Ningbo University, Ningbo, China
| | - Li Yanguo
- Institute of Drug Discovery Technology, Ningbo University, Ningbo, China
| | - Shen Zhisen
- The Affiliated Lihuili Hospital, Ningbo University, Ningbo, China
- School of Medicine, Ningbo University, Ningbo, China
| |
Collapse
|
44
|
Hagos YB, Akarca AU, Ramsay A, Rossi RL, Pomplun S, Ngai V, Moioli A, Gianatti A, Mcnamara C, Rambaldi A, Quezada SA, Linch D, Gritti G, Yuan Y, Marafioti T. High inter-follicular spatial co-localization of CD8+FOXP3+ with CD4+CD8+ cells predicts favorable outcome in follicular lymphoma. Hematol Oncol 2022; 40:541-553. [PMID: 35451108 PMCID: PMC10577604 DOI: 10.1002/hon.3003] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 04/04/2022] [Accepted: 04/06/2022] [Indexed: 11/07/2022]
Abstract
The spatial architecture of the lymphoid tissue in follicular lymphoma (FL) presents unique challenges to studying its immune microenvironment. We investigated the spatial interplay of T cells, macrophages, myeloid cells and natural killer T cells using multispectral immunofluorescence images of diagnostic biopsies of 32 patients. A deep learning-based image analysis pipeline was tailored to the needs of follicular lymphoma spatial histology research, enabling the identification of different immune cells within and outside neoplastic follicles. We analyzed the density and spatial co-localization of immune cells in the inter-follicular and intra-follicular regions of follicular lymphoma. Low inter-follicular density of CD8+FOXP3+ cells and co-localization of CD8+FOXP3+ with CD4+CD8+ cells were significantly associated with relapse (p = 0.0057 and p = 0.0019, respectively) and shorter time to progression after first-line treatment (Logrank p = 0.0097 and log-rank p = 0.0093, respectively). A low inter-follicular density of CD8+FOXP3+ cells is associated with increased risk of relapse independent of follicular lymphoma international prognostic index (FLIPI) (p = 0.038, Hazard ratio (HR) = 0.42 [0.19, 0.95], but not independent of co-localization of CD8+FOXP3+ with CD4+CD8+ cells (p = 0.43). Co-localization of CD8+FOXP3+ with CD4+CD8+ cells is predictors of time to relapse independent of the FLIPI score and density of CD8+FOXP3+ cells (p = 0.027, HR = 0.0019 [7.19 × 10-6 , 0.49], This suggests a potential role of inter-follicular CD8+FOXP3+ and CD4+CD8+ cells in the disease progression of FL, warranting further validation on larger patient cohorts.
Collapse
Affiliation(s)
- Yeman B. Hagos
- Centre for Evolution and Cancer and Division of Molecular PathologyThe Institute of Cancer ResearchLondonUK
| | | | - Alan Ramsay
- Department of HistopathologyUniversity College Hospitals LondonLondonUK
| | | | - Sabine Pomplun
- Department of HistopathologyUniversity College Hospitals LondonLondonUK
| | - Victoria Ngai
- Cancer InstituteUniversity College LondonLondonUK
- Department of HistopathologyUniversity College Hospitals LondonLondonUK
| | | | | | | | - Alessandro Rambaldi
- Hematology UnitOspedale Papa Giovanni XXIIIBergamoItaly
- Department of Oncology and Hematology‐OncologyUniversity of MilanMilanItaly
| | - Sergio A. Quezada
- Cancer Immunology UnitUniversity College London Cancer InstituteUniversity College LondonLondonUK
- Research Department of HaematologyUniversity College London Cancer InstituteUniversity College LondonLondonUK
| | - David Linch
- Research Department of HaematologyUniversity College London Cancer InstituteUniversity College LondonLondonUK
| | | | - Yinyin Yuan
- Centre for Evolution and Cancer and Division of Molecular PathologyThe Institute of Cancer ResearchLondonUK
- Centre for Molecular PathologyRoyal Marsden HospitalLondonUK
| | - Teresa Marafioti
- Cancer InstituteUniversity College LondonLondonUK
- Department of HistopathologyUniversity College Hospitals LondonLondonUK
| |
Collapse
|
45
|
Transcriptome dynamics of hippocampal neurogenesis in macaques across the lifespan and aged humans. Cell Res 2022; 32:729-743. [PMID: 35750757 PMCID: PMC9343414 DOI: 10.1038/s41422-022-00678-y] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 05/26/2022] [Indexed: 01/06/2023] Open
Abstract
Whether adult hippocampal neurogenesis (AHN) persists in adult and aged humans continues to be extensively debated. A major question is whether the markers identified in rodents are reliable enough to reveal new neurons and the neurogenic trajectory in primates. Here, to provide a better understanding of AHN in primates and to reveal more novel markers for distinct cell types, droplet-based single-nucleus RNA sequencing (snRNA-seq) is used to investigate the cellular heterogeneity and molecular characteristics of the hippocampi in macaques across the lifespan and in aged humans. All of the major cell types in the hippocampus and their expression profiles were identified. The dynamics of the neurogenic lineage was revealed and the diversity of astrocytes and microglia was delineated. In the neurogenic lineage, the regulatory continuum from adult neural stem cells (NSCs) to immature and mature granule cells was investigated. A group of primate-specific markers were identified. We validated ETNPPL as a primate-specific NSC marker and verified STMN1 and STMN2 as immature neuron markers in primates. Furthermore, we illustrate a cluster of active astrocytes and microglia exhibiting proinflammatory responses in aged samples. The interaction analysis and the comparative investigation on published datasets and ours imply that astrocytes provide signals inducing the proliferation, quiescence and inflammation of adult NSCs at different stages and that the proinflammatory status of astrocytes probably contributes to the decrease and variability of AHN in adults and elderly individuals.
Collapse
|
46
|
Gagnon J, Pi L, Ryals M, Wan Q, Hu W, Ouyang Z, Zhang B, Li K. Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking. LIFE (BASEL, SWITZERLAND) 2022; 12:life12060850. [PMID: 35743881 PMCID: PMC9225332 DOI: 10.3390/life12060850] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 05/31/2022] [Accepted: 06/04/2022] [Indexed: 12/13/2022]
Abstract
To guide analysts to select the right tool and parameters in differential gene expression analyses of single-cell RNA sequencing (scRNA-seq) data, we developed a novel simulator that recapitulates the data characteristics of real scRNA-seq datasets while accounting for all the relevant sources of variation in a multi-subject, multi-condition scRNA-seq experiment: the cell-to-cell variation within a subject, the variation across subjects, the variability across cell types, the mean/variance relationship of gene expression across genes, library size effects, group effects, and covariate effects. By applying it to benchmark 12 differential gene expression analysis methods (including cell-level and pseudo-bulk methods) on simulated multi-condition, multi-subject data of the 10x Genomics platform, we demonstrated that methods originating from the negative binomial mixed model such as glmmTMB and NEBULA-HL outperformed other methods. Utilizing NEBULA-HL in a statistical analysis pipeline for single-cell analysis will enable scientists to better understand the cell-type-specific transcriptomic response to disease or treatment effects and to discover new drug targets. Further, application to two real datasets showed the outperformance of our differential expression (DE) pipeline, with unified findings of differentially expressed genes (DEG) and a pseudo-time trajectory transcriptomic result. In the end, we made recommendations for filtering strategies of cells and genes based on simulation results to achieve optimal experimental goals.
Collapse
Affiliation(s)
- Jake Gagnon
- Analytics and Data Sciences, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
| | - Lira Pi
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Matthew Ryals
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Qingwen Wan
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Wenxing Hu
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
| | - Zhengyu Ouyang
- BioInfoRx, Inc., 510 Charmany Dr., Suite 275A, Madison, WI 53719, USA;
| | - Baohong Zhang
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
- Correspondence: (B.Z.); (K.L.)
| | - Kejie Li
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
- Correspondence: (B.Z.); (K.L.)
| |
Collapse
|
47
|
Wei X, Li Z, Ji H, Wu H. EDClust: an EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing. Bioinformatics 2022; 38:2692-2699. [PMID: 35561178 PMCID: PMC9113373 DOI: 10.1093/bioinformatics/btac168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 02/24/2022] [Accepted: 03/18/2022] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) has revolutionized biological research by enabling the measurement of transcriptomic profiles at the single-cell level. With the increasing application of scRNA-seq in larger-scale studies, the problem of appropriately clustering cells emerges when the scRNA-seq data are from multiple subjects. One challenge is the subject-specific variation; systematic heterogeneity from multiple subjects may have a significant impact on clustering accuracy. Existing methods seeking to address such effects suffer from several limitations. RESULTS We develop a novel statistical method, EDClust, for multi-subject scRNA-seq cell clustering. EDClust models the sequence read counts by a mixture of Dirichlet-multinomial distributions and explicitly accounts for cell-type heterogeneity, subject heterogeneity and clustering uncertainty. An EM-MM hybrid algorithm is derived for maximizing the data likelihood and clustering the cells. We perform a series of simulation studies to evaluate the proposed method and demonstrate the outstanding performance of EDClust. Comprehensive benchmarking on four real scRNA-seq datasets with various tissue types and species demonstrates the substantial accuracy improvement of EDClust compared to existing methods. AVAILABILITY AND IMPLEMENTATION The R package is freely available at https://github.com/weix21/EDClust. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xin Wei
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Hongkai Ji
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
48
|
Li Z, Feng H. A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data. Sci Rep 2022; 12:910. [PMID: 35042860 PMCID: PMC8766435 DOI: 10.1038/s41598-021-04473-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 12/21/2021] [Indexed: 02/01/2023] Open
Abstract
The fast-advancing single cell RNA sequencing (scRNA-seq) technology enables researchers to study the transcriptome of heterogeneous tissues at a single cell level. The initial important step of analyzing scRNA-seq data is usually to accurately annotate cells. The traditional approach of annotating cell types based on unsupervised clustering and marker genes is time-consuming and laborious. Taking advantage of the numerous existing scRNA-seq databases, many supervised label assignment methods have been developed. One feature that many label assignment methods shares is to label cells with low confidence as "unassigned." These unassigned cells can be the result of assignment difficulties due to highly similar cell types or caused by the presence of unknown cell types. However, when unknown cell types are not expected, existing methods still label a considerable number of cells as unassigned, which is not desirable. In this work, we develop a neural network-based cell annotation method called NeuCA (Neural network-based Cell Annotation) for scRNA-seq data obtained from well-studied tissues. NeuCA can utilize the hierarchical structure information of the cell types to improve the annotation accuracy, which is especially helpful when data contain closely correlated cell types. We show that NeuCA can achieve more accurate cell annotation results compared with existing methods. Additionally, the applications on eight real datasets show that NeuCA has stable performance for intra- and inter-study annotation, as well as cross-condition annotation. NeuCA is freely available as an R/Bioconductor package at https://bioconductor.org/packages/NeuCA .
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA.
| |
Collapse
|
49
|
Rai P, Sengupta D, Majumdar A. SelfE: Gene Selection via Self-Expression for Single-Cell Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:624-632. [PMID: 32750851 DOI: 10.1109/tcbb.2020.2997326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Single-cell RNA sequencing has been proved to be advantageous in discerning molecular heterogeneity in seemingly similar cells in a tissue. Due to the paucity of starting RNA, a large fraction of transcripts fail to amplify during the polymerase chain reaction cycle. This gets compounded by trivial biological noise such as variability in the cell cycle specific genes. As a result expression matrix obtained from a single-cell study is highly sparse with a large number of missing values. This hinders downstream analysis of single-cell expression data. It has been observed that feature engineering significantly improves the analysis outcomes. Feature extraction methods such as principal component analysis and zero-inflated factor analysis have been shown to be useful for subsequent steps of data analysis including clustering. However, too little or no visible efforts have been observed for developing feature selection techniques, which offer transparency for the analyst's consumption. We propose SelfE, a novel l2,0 -minimization algorithm that determines an optimal subset of feature vectors that preserves sub-space structures as observed in the data. We compared SelfE with the commonly used feature selection methods for single-cell expression data analysis.
Collapse
|
50
|
|