1
|
Zeng W, Zhang Y, Zhong W, Chen L, Gao Y, Li C, Zhao Y, Shen C, Zhao R, Shi B, Wang Y. Deciphering immune cell heterogeneity in vascular diseases: Insights from single-cell sequencing. Int Immunopharmacol 2025; 157:114719. [PMID: 40306113 DOI: 10.1016/j.intimp.2025.114719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Revised: 04/12/2025] [Accepted: 04/21/2025] [Indexed: 05/02/2025]
Abstract
The complexity and diversity of vascular diseases highlight the urgent need to study their pathogenesis, particularly the key role of immune cell-mediated inflammatory responses in their development. While previous reviews have outlined the involvement of immune cells in vascular pathology, a comprehensive understanding of their dynamic changes, functional states, and intercellular interactions remains incomplete. Recent advances in single-cell sequencing (SCS) have provided unprecedented insights into immune cell heterogeneity, enabling the identification of novel subpopulations and their roles in disease progression.This review extends prior work by systematically summarizing the latest applications of SCS in vascular diseases, highlighting newly discovered immune cell subsets, their interactions, and their impact on vascular pathology. By addressing current gaps in the literature-such as the functional plasticity of immune cells and their temporal dynamics-this review offers new perspectives on immune-mediated mechanisms in vascular diseases and proposes novel therapeutic strategies for their prevention and treatment.
Collapse
Affiliation(s)
- Weirong Zeng
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China
| | - Yu Zhang
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China
| | - Wanyue Zhong
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China
| | - Lei Chen
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China
| | - Yixuan Gao
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China
| | - Chaofu Li
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China
| | - Yongchao Zhao
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China
| | - Changyin Shen
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China
| | - Ranzun Zhao
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China.
| | - Bei Shi
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China.
| | - Yan Wang
- Department of Cardiology, Affiliated Hospital of Zunyi Medical University, Zunyi 563000, China.
| |
Collapse
|
2
|
Tuly SR, Ranjbari S, Murat EA, Arslanturk S. From Silos to Synthesis: A comprehensive review of domain adaptation strategies for multi-source data integration in healthcare. Comput Biol Med 2025; 191:110108. [PMID: 40209575 DOI: 10.1016/j.compbiomed.2025.110108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 03/27/2025] [Accepted: 03/27/2025] [Indexed: 04/12/2025]
Abstract
BACKGROUND The integration of data from diverse sources is not only crucial for addressing data scarcity in health informatics but also enables the use of complementary information from multiple datasets. However, the isolated nature of data collected from disparate sources (referred to as 'Silos') presents significant challenges in multi-source data integration due to inherent heterogeneity and differences in data structures, formats, and standards. Domain adaptation emerges as a key framework to transition from 'Silos' to 'Synthesis' by measuring and mitigating such discrepancies, enabling uniform representation and harmonization of multi-source data. METHODS This study explores different approaches to healthcare data integration, highlighting the challenges associated with each type and discussing both general-purpose and healthcare-specific adaptation methods. We examine key research challenges and evaluate leading domain adaptation approaches, demonstrating their effectiveness and limitations in advancing healthcare data integration. RESULTS The findings highlight the potential of domain adaptation methods to significantly improve healthcare data integration while laying a foundation for future research. CONCLUSION Current research often lacks a comprehensive analysis of how domain adaptation can effectively address the challenges associated with integrating multi-source and multi-modal healthcare datasets. This study serves as a valuable resource for healthcare professionals and researchers, providing guidance on leveraging domain adaptation techniques to mitigate domain discrepancies in healthcare data integration.
Collapse
Affiliation(s)
- Shelia Rahman Tuly
- Department of Computer Science, Wayne State University, 5057 Woodward Ave, Detroit, 48201, MI, USA.
| | - Sima Ranjbari
- Department of Computer Science, Wayne State University, 5057 Woodward Ave, Detroit, 48201, MI, USA.
| | - Ekrem Alper Murat
- Department of Industrial and Systems Engineering, Wayne State University, 4th Street, Detroit, 48201, MI, USA.
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, 5057 Woodward Ave, Detroit, 48201, MI, USA.
| |
Collapse
|
3
|
Junaid M, Lee EJ, Lim SB. Single-cell and spatial omics: exploring hypothalamic heterogeneity. Neural Regen Res 2025; 20:1525-1540. [PMID: 38993130 PMCID: PMC11688568 DOI: 10.4103/nrr.nrr-d-24-00231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 05/06/2024] [Accepted: 06/03/2024] [Indexed: 07/13/2024] Open
Abstract
Elucidating the complex dynamic cellular organization in the hypothalamus is critical for understanding its role in coordinating fundamental body functions. Over the past decade, single-cell and spatial omics technologies have significantly evolved, overcoming initial technical challenges in capturing and analyzing individual cells. These high-throughput omics technologies now offer a remarkable opportunity to comprehend the complex spatiotemporal patterns of transcriptional diversity and cell-type characteristics across the entire hypothalamus. Current single-cell and single-nucleus RNA sequencing methods comprehensively quantify gene expression by exploring distinct phenotypes across various subregions of the hypothalamus. However, single-cell/single-nucleus RNA sequencing requires isolating the cell/nuclei from the tissue, potentially resulting in the loss of spatial information concerning neuronal networks. Spatial transcriptomics methods, by bypassing the cell dissociation, can elucidate the intricate spatial organization of neural networks through their imaging and sequencing technologies. In this review, we highlight the applicative value of single-cell and spatial transcriptomics in exploring the complex molecular-genetic diversity of hypothalamic cell types, driven by recent high-throughput achievements.
Collapse
Affiliation(s)
- Muhammad Junaid
- Department of Biochemistry & Molecular Biology, Ajou University School of Medicine, Suwon, South Korea
- Department of Biomedical Sciences, Graduate School of Ajou University, Suwon, South Korea
| | - Eun Jeong Lee
- Department of Biomedical Sciences, Graduate School of Ajou University, Suwon, South Korea
- Department of Brain Science, Ajou University School of Medicine, Suwon, South Korea
| | - Su Bin Lim
- Department of Biochemistry & Molecular Biology, Ajou University School of Medicine, Suwon, South Korea
- Department of Biomedical Sciences, Graduate School of Ajou University, Suwon, South Korea
| |
Collapse
|
4
|
da Silva JEH, Bernardino HS, de Oliveira IL, Camata JJ. A survey of the methodological process of modeling, inference, and evaluation of gene regulatory networks using scRNA-Seq data. Biosystems 2025; 253:105464. [PMID: 40409400 DOI: 10.1016/j.biosystems.2025.105464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 03/20/2025] [Accepted: 04/17/2025] [Indexed: 05/25/2025]
Abstract
The advent of scRNA-Seq sequencing technology has provided unprecedented resolutions in the analysis of gene regulatory networks (GRNs) at the single-cell level. However, new technical and methodological challenges also emerged. Factors such as the large number of zeros reported in expression levels, the biological variation due to the stochastic nature of gene expression, environmental niche, and effects created by the cell cycle make it difficult to correctly interpret the data obtained in the sequencing stage. On the other hand, the development of methods for the inference of GRNs, specifically using scRNA-Seq technology, proved to be of similar quality to random predictors. The lack of adequate pre-processing of gene expression data, including selection steps for subsets of genes of interest, smoothing, and discretization of gene expression, in addition to the different ways of modeling networks and network motifs, are factors that affect the performance of inference approaches. Finally, the lack of knowledge about the ground-truth network and the non-standardization of appropriate metrics to measure the quality of inferred networks make the process of comparing performance between algorithms a major problem, given the unbalanced nature of the data and the interpretation bias caused by the chosen metric. This article brings these issues to light, aiming to show how these factors influence both the inference process and the performance evaluation of inferred networks, through comparative computational experiments and provides suggestions for a more robust methodological process for researchers dealing with inference of GRNs.
Collapse
Affiliation(s)
- José Eduardo H da Silva
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil.
| | - Heder S Bernardino
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - Itamar L de Oliveira
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| | - José J Camata
- Universidade Federal de Juiz de Fora, Rua José Lourenço Kelmer, s/n, Juiz de Fora, 36036-900, Minas Gerais, Brazil
| |
Collapse
|
5
|
Pandey AC, Bezney J, DeAscanis D, Kirsch EB, Ahmed F, Crinklaw A, Choudhary KS, Mandala T, Deason J, Hamidi JS, Siddique A, Ranganathan S, Brown K, Armstrong J, Head S, Ordoukhanian P, Steinmetz LM, Topol EJ. A CRISPR/Cas9-based enhancement of high-throughput single-cell transcriptomics. Nat Commun 2025; 16:4664. [PMID: 40389438 PMCID: PMC12089397 DOI: 10.1038/s41467-025-59880-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 05/03/2025] [Indexed: 05/21/2025] Open
Abstract
Single-cell RNA-seq (scRNAseq) struggles to capture the cellular heterogeneity of transcripts within individual cells due to the prevalence of highly abundant and ubiquitous transcripts, which can obscure the detection of biologically distinct transcripts expressed up to several orders of magnitude lower levels. To address this challenge, here we introduce single-cell CRISPRclean (scCLEAN), a molecular method that globally recomposes scRNAseq libraries, providing a benefit that cannot be recapitulated with deeper sequencing. scCLEAN utilizes the programmability of CRISPR/Cas9 to target and remove less than 1% of the transcriptome while redistributing approximately half of reads, shifting the focus toward less abundant transcripts. We experimentally apply scCLEAN to both heterogeneous immune cells and homogenous vascular smooth muscle cells to demonstrate its ability to uncover biological signatures in different biological contexts. We further emphasize scCLEAN's versatility by applying it to a third-generation sequencing method, single-cell MAS-Seq, to increase transcript-level detection and discovery. Here we show the possible utility of scCLEAN across a wide array of human tissues and cell types, indicating which contexts this technology proves beneficial and those in which its application is not advisable.
Collapse
Affiliation(s)
- Amitabh C Pandey
- Section of Cardiology, Tulane Heart and Vascular Institute, Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA.
- Southeast Louisiana Veterans Health Care System, New Orleans, LA, USA.
- Department of Molecular Medicine, Scripps Research Translational Institute, The Scripps Research Institute, La Jolla, CA, USA.
| | - Jon Bezney
- Genomics Core Facility, The Scripps Research Institute, La Jolla, CA, USA
- Jumpcode Genomics, San Diego, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ethan B Kirsch
- Department of Molecular Medicine, Scripps Research Translational Institute, The Scripps Research Institute, La Jolla, CA, USA
| | - Farin Ahmed
- Genomics Core Facility, The Scripps Research Institute, La Jolla, CA, USA
| | | | | | - Tony Mandala
- Genomics Core Facility, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Jasmin S Hamidi
- Department of Molecular Medicine, Scripps Research Translational Institute, The Scripps Research Institute, La Jolla, CA, USA
| | | | | | | | | | - Steven Head
- Genomics Core Facility, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Lars M Steinmetz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Genome Technology Center, Palo Alto, CA, USA
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eric J Topol
- Department of Molecular Medicine, Scripps Research Translational Institute, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
6
|
Lund A, Zeng Y, Zhang R, Li H, Zhang M. Lipopolysaccharide alters cell communication at the maternal-fetal interface revealed by single-cell RNA-sequencing. Int J Biol Macromol 2025; 311:143939. [PMID: 40328399 DOI: 10.1016/j.ijbiomac.2025.143939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Revised: 04/30/2025] [Accepted: 05/03/2025] [Indexed: 05/08/2025]
Abstract
Embryo implantation is a decisive process in pregnancy that highly relies on effective cell communication at the maternal-fetal interface. Embryo implantation failure is frequently caused by gram-negative bacterial infection, therefore, this study aimed to investigate the effect of Lipopolysaccharides (LPS)-induced inflammation on cellular composition, cell-cell interaction and key signaling pathways at the maternal-fetal interface using Single-cell RNA-Sequence (scRNA-Seq). LPS exposure significantly up-regulated the expression of pro-inflammatory cytokines, CCL-2, TNF-α, and IL-1β in maternal-fetal interface tissues as well as triggered the recruitment of neutrophils, monocytes and eosinophils into peripheral blood. scRNA-Seq revealed endometrial epithelial cells (EpCs), stromal cells (ESCs), Fibroblasts (FiCs) and 15 other cell types. LPS administration significantly shifted the cellular proportions, increased populations of immune cells and fibroblasts while decreased ESCs and EpCs. Cellular differentiation indicated that all ESCs originated from ESC8 while ESC2 and 7 were the most differentiated ESC subtypes. Likewise, cellular communication demonstrated notable differences, reversed interactions were observed exclusively on the LPS exposure between luminal epithelial (LE) and glandular epithelial (GE) cells. ESC8 was inactive in the control group but exhibited robust interactions in the LPS group. Furthermore, the communication analysis predicted significant disruptions in the signaling pathways: Embryo-maternal communications (DHEA, BMP, LIFR, EDN, and NEGR pathways). Endometrial stromal-epithelial crosswalks (5αP, CAECAM, DHEAS and HH pathways) and Endometrial stromal-immune cell interactions (EGF and NCAM pathways). Our findings suggest that signaling pathways are essential for maternal-fetal communication. The disruption of the pathways in response to LPS may provide new molecular targets for diagnosing and treating implantation failure and recurrent pregnancy loss.
Collapse
Affiliation(s)
- Arab Lund
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu Campus, 611130, PR China; Shaheed Benazir Bhutto University of Veterinary and Animal Science, Sakrand 67210, Sindh, Pakistan
| | - Yutiang Zeng
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu Campus, 611130, PR China
| | - Run Zhang
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu Campus, 611130, PR China
| | - Hao Li
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu Campus, 611130, PR China
| | - Ming Zhang
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu Campus, 611130, PR China; Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Sciences and Technology, Sichuan Agricultural University, Chengdu Campus, 611130, PR China; Farm Animal Genetics Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu Campus, 611130, PR China.
| |
Collapse
|
7
|
Guo B, Ling W, Kwon SH, Panwar P, Ghazanfar S, Martinowich K, Hicks SC. Integrating Spatially-Resolved Transcriptomics Data Across Tissues and Individuals: Challenges and Opportunities. SMALL METHODS 2025; 9:e2401194. [PMID: 39935130 PMCID: PMC12103234 DOI: 10.1002/smtd.202401194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/13/2024] [Indexed: 02/13/2025]
Abstract
Advances in spatially-resolved transcriptomics (SRT) technologies have propelled the development of new computational analysis methods to unlock biological insights. The lowering cost of SRT data generation presents an unprecedented opportunity to create large-scale spatial atlases and enable population-level investigation, integrating SRT data across multiple tissues, individuals, species, or phenotypes. Here, unique challenges are described in the SRT data integration, where the analytic impact of varying spatial and biological resolutions is characterized and explored. A succinct review of spatially-aware integration methods and computational strategies is provided. Exciting opportunities to advance computational algorithms amenable to atlas-scale datasets along with standardized preprocessing methods, leading to improved sensitivity and reproducibility in the future are further highlighted.
Collapse
Affiliation(s)
- Boyi Guo
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMD21205USA
| | - Wodan Ling
- Division of BiostatisticsDepartment of Population Health SciencesWeill Cornell MedicineNew YorkNY10065USA
| | - Sang Ho Kwon
- Lieber Institute for Brain DevelopmentJohns Hopkins Medical CampusBaltimoreMD21205USA
- Solomon H. Snyder Department of NeuroscienceJohns Hopkins School of MedicineBaltimoreMD21205USA
- Biochemistry, Cellular, and Molecular Biology Graduate ProgramJohns Hopkins School of MedicineBaltimoreMD21205USA
| | - Pratibha Panwar
- School of Mathematics and StatisticsThe University of SydneyCamperdownNSW2006Australia
- Sydney Precision Data Science CentreUniversity of SydneyCamperdownNSW2006Australia
- Charles Perkins CentreThe University of SydneyCamperdownNSW2006Australia
| | - Shila Ghazanfar
- School of Mathematics and StatisticsThe University of SydneyCamperdownNSW2006Australia
- Sydney Precision Data Science CentreUniversity of SydneyCamperdownNSW2006Australia
- Charles Perkins CentreThe University of SydneyCamperdownNSW2006Australia
| | - Keri Martinowich
- Lieber Institute for Brain DevelopmentJohns Hopkins Medical CampusBaltimoreMD21205USA
- Solomon H. Snyder Department of NeuroscienceJohns Hopkins School of MedicineBaltimoreMD21205USA
- Department of Psychiatry and Behavioral SciencesJohns Hopkins School of MedicineBaltimoreMDUSA
- Johns Hopkins Kavli Neuroscience Discovery InstituteJohns Hopkins UniversityBaltimoreMD21218USA
- Department of Biomedical EngineeringJohns Hopkins UniversityBaltimoreMD21218USA
| | - Stephanie C. Hicks
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMD21205USA
- Center for Computational BiologyJohns Hopkins UniversityBaltimoreMD21218USA
- Malone Center for Engineering in HealthcareJohns Hopkins UniversityBaltimoreMD21218USA
| |
Collapse
|
8
|
Sukys A, Grima R. Cell-cycle dependence of bursty gene expression: insights from fitting mechanistic models to single-cell RNA-seq data. Nucleic Acids Res 2025; 53:gkaf295. [PMID: 40240003 PMCID: PMC12000877 DOI: 10.1093/nar/gkaf295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/22/2025] [Accepted: 03/28/2025] [Indexed: 04/18/2025] Open
Abstract
Bursty gene expression is characterized by two intuitive parameters, burst frequency and burst size, the cell-cycle dependence of which has not been extensively profiled at the transcriptome level. In this study, we estimate the burst parameters per allele in the G1 and G2/M cell-cycle phases for thousands of mouse genes by fitting mechanistic models of gene expression to messenger RNA count data, obtained by sequencing of single cells whose cell-cycle position has been inferred using a deep-learning method. We find that upon DNA replication, the median burst frequency approximately halves, while the burst size remains mostly unchanged. Genome-wide distributions of the burst parameter ratios between the G2/M and G1 phases are broad, indicating substantial heterogeneity in transcriptional regulation. We also observe a significant negative correlation between the burst frequency and size ratios, suggesting that regulatory processes do not independently control the burst parameters. We show that to accurately estimate the burst parameter ratios, mechanistic models must explicitly account for gene copy number variation and extrinsic noise due to the coupling of transcription to cell age across the cell cycle, but corrections for technical noise due to imperfect capture of RNA molecules in sequencing experiments are less critical.
Collapse
Affiliation(s)
- Augustinas Sukys
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, United Kingdom
- The Alan Turing Institute, London NW1 2DB, United Kingdom
- School of BioSciences, University of Melbourne, Parkville, Victoria 3052, Australia
| | - Ramon Grima
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, United Kingdom
| |
Collapse
|
9
|
He C, Filippidis P, Kleinstein SH, Guan L. Partially characterized topology guides reliable anchor-free scRNA-integration. Commun Biol 2025; 8:561. [PMID: 40185996 PMCID: PMC11971424 DOI: 10.1038/s42003-025-07988-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Accepted: 03/21/2025] [Indexed: 04/07/2025] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is an important technique for obtaining biological insights at cellular resolution, with scRNA-seq batch integration a key step before downstream statistical analysis. Despite the plethora of methods proposed, achieving reliable batch correction while preserving the heterogeneity of biological signals that define cell type continues to pose a challenge. To address this, we propose scCRAFT, an autoencoder model that separates cell-type-related signals from batch effects for reliable multi-batch integration. scCRAFT integrates three key loss components: a reconstruction loss for observation reconstruction, a multi-domain adaptation loss to eliminate batch effects, and an innovative dual-resolution triplet loss to preserve intra-batch, introduced as an effective mechanism to counteract the over-correction effect of domain adaptation loss amid heterogeneous cell distributions across batches. We show that scCRAFT effectively manages unbalanced batches, rare cell types, and batch-specific cell phenotypes in simulations, and surpasses state-of-the-art methods in a diverse set of real datasets.
Collapse
Affiliation(s)
- Chuan He
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, US
| | | | - Steven H Kleinstein
- Department of Pathology, Yale School of Medicine, New Haven, CT, US
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, US
- Program in Computational Biology and Biomedical Informatics, Yale University, New Haven, CT, US
| | - Leying Guan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, US.
- Program in Computational Biology and Biomedical Informatics, Yale University, New Haven, CT, US.
| |
Collapse
|
10
|
Parker MT, Amar S, Campoy JA, Krause K, Tusso S, Marek M, Huettel B, Schneeberger K. Scalable eQTL mapping using single-nucleus RNA-sequencing of recombined gametes from a small number of individuals. PLoS Biol 2025; 23:e3003085. [PMID: 40279341 PMCID: PMC12119024 DOI: 10.1371/journal.pbio.3003085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2025] [Revised: 05/28/2025] [Accepted: 02/25/2025] [Indexed: 04/27/2025] Open
Abstract
Phenotypic differences between individuals of a species are often caused by differences in gene expression, which are in turn caused by genetic variation. Expression quantitative trait locus (eQTL) analysis is a methodology by which we can identify such causal variants. Scaling eQTL analysis is costly due to the expense of generating mapping populations, and the collection of matched transcriptomic and genomic information. We developed a rapid eQTL analysis approach using single-cell/nucleus RNA sequencing of gametes from a small number of heterozygous individuals. Patterns of inherited polymorphisms are used to infer the recombinant genomes of thousands of individual gametes and identify how different haplotypes correlate with variation in gene expression. Applied to Arabidopsis pollen nuclei, our approach uncovers both cis- and trans-eQTLs, ultimately mapping variation in a master regulator of sperm cell development that affects the expression of hundreds of genes. This establishes snRNA-sequencing as a powerful, cost-effective method for the mapping of meiotic recombination, addressing the scalability challenges of eQTL analysis and enabling eQTL mapping in specific cell-types.
Collapse
Affiliation(s)
- Matthew T. Parker
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Samija Amar
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - José A. Campoy
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Kristin Krause
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Sergio Tusso
- Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | | | | | - Korbinian Schneeberger
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich-Heine University, Düsseldorf, Germany
| |
Collapse
|
11
|
Rafi FR, Heya NR, Hafiz MS, Jim JR, Kabir MM, Mridha MF. A systematic review of single-cell RNA sequencing applications and innovations. Comput Biol Chem 2025; 115:108362. [PMID: 39919386 DOI: 10.1016/j.compbiolchem.2025.108362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 12/26/2024] [Accepted: 01/21/2025] [Indexed: 02/09/2025]
Abstract
Bulk RNA sequencing is one type of RNA sequencing technique, as well as targeted RNA sequencing and whole transcriptome sequencing. It provides valuable insights into gene expression in specific cell populations or regions. However, these methods often miss the diversity of cells within complex tissues. This restriction is overcome by single-cell RNA sequencing, which records gene expression at the single-cell level. It offers a detailed picture of the diversity of cells. It is essential to study glucose homeostasis. It offers thorough explanations of cellular variation. Networks and Governance Dynamics The use of scRNA-seq in islet cells is reviewed in this study, along with sample preparation, sequencing, and computational analysis. It highlights advances in understanding cell types. Gene activity and cell interactions. Along with the challenges and limitations of scRNA-seq, this review highlights the importance of scRNA-seq in understanding complex biological processes and diseases. It is an essential resource for future research and method development in this field, which will help to build personalized treatment.
Collapse
Affiliation(s)
- Fahamidur Rahaman Rafi
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1340, Bangladesh.
| | - Nafeya Rahman Heya
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1340, Bangladesh.
| | - Md Sadman Hafiz
- Institute of Information and Communication Technology, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh.
| | - Jamin Rahman Jim
- Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh.
| | - Md Mohsin Kabir
- Department of Computer Science & Engineering, Bangladesh University of Business & Technology, Dhaka 1216, Bangladesh.
| | - M F Mridha
- Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh.
| |
Collapse
|
12
|
Zou Z, Liu Y, Bai Y, Luo J, Zhang Z. scTrans: Sparse attention powers fast and accurate cell type annotation in single-cell RNA-seq data. PLoS Comput Biol 2025; 21:e1012904. [PMID: 40184563 PMCID: PMC11970913 DOI: 10.1371/journal.pcbi.1012904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Accepted: 02/24/2025] [Indexed: 04/06/2025] Open
Abstract
Cell type annotation is crucial in single-cell RNA sequencing data analysis because it enables significant biological discoveries and deepens our understanding of tissue biology. Given the high-dimensional and highly sparse nature of single-cell RNA sequencing data, most existing annotation tools focus on highly variable genes to reduce dimensionality and computational load. However, this approach inevitably results in information loss, potentially weakening the model's generalization performance and adaptability to novel datasets. To mitigate this issue, we developed scTrans, a single cell Transformer-based model, which employs sparse attention to utilize all non-zero genes, thereby effectively reducing the input data dimensionality while minimizing information loss. We validated the speed and accuracy of scTrans by performing cell type annotation on 31 different tissues within the Mouse Cell Atlas. Remarkably, even with datasets nearing a million cells, scTrans efficiently perform cell type annotation in limited computational resources. Furthermore, scTrans demonstrates strong generalization capabilities, accurately annotating cells in novel datasets and generating high-quality latent representations, which are essential for precise clustering and trajectory analysis.
Collapse
Affiliation(s)
- Zhiyi Zou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Ying Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Yuting Bai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Zhaolei Zhang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
13
|
Leibovich N, Goyal S. Limitations and optimizations of cellular lineages tracking. PLoS Comput Biol 2025; 21:e1012880. [PMID: 40228207 PMCID: PMC11996212 DOI: 10.1371/journal.pcbi.1012880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Accepted: 02/14/2025] [Indexed: 04/16/2025] Open
Abstract
Tracking cellular lineages using genetic barcodes provides insights across biology and has become an important tool. However, barcoding strategies remain ad hoc. We show that elevating barcode insertion probability and thus increasing the average number of barcodes within the cells, adds to the number of traceable lineages but may decrease the accuracy of lineages inference due to reading errors. We establish the trade-off between accuracy in tracing lineages and the total number of traceable lineages, and find optimal experimental parameters under limited resources concerning the populations size of tracked cells and barcode pool complexity.
Collapse
Affiliation(s)
- Nava Leibovich
- NRC-Fields Mathematical Sciences Collaboration Centre, National Research Council of Canada, Toronto, Ontario, Canada
- Department of Physics, University of Toronto, Toronto, Ontario, Canada
| | - Sidhartha Goyal
- Department of Physics, University of Toronto, Toronto, Ontario, Canada
- Institute for Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
14
|
Pavel A, Grønberg MG, Clemmensen LH. The impact of dropouts in scRNAseq dense neighborhood analysis. Comput Struct Biotechnol J 2025; 27:1278-1285. [PMID: 40225837 PMCID: PMC11992407 DOI: 10.1016/j.csbj.2025.03.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 03/19/2025] [Accepted: 03/20/2025] [Indexed: 04/15/2025] Open
Abstract
Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and preprocessing pipelines are based on the assumption that poor data can be counteracted by quantity and that similar cells (samples) are close to each other in space. Clustering is commonly used to detect clusters (dense local cell neighborhoods) under the assumption that similar cells are close to each other in space (where close is dependent on the (distance) metric used). The most commonly used clustering methodologies to detect dense local neighborhoods are based on graph clustering on a nearest neighbor graph. However, high dropout rates may break this assumption and make it difficult to reliably detect such dense local neighborhoods. We assess the cluster homogeneity and stability under increasing degrees of dropouts in one of the most popular clustering pipelines (dimensionality reduction + graph based clustering), as provided by scRNAseq analyses packages Seurat and Scanpy. Our study showcases that while the default pipeline performs well in terms of cluster homogeneity (i.e., cells in a cluster are of the same type), also with increasing dropout rates, the stability of clusters (i.e., cell pairs consistently being in the same cluster) decreases. This implies that sub-populations within cell types are increasingly difficult to identify under increasing dropout rates because observations are not consistently close. Our results challenge the current practice of using default clustering pipelines and the general assumption of identifiable local neighborhoods on high dropout data. Hence, these results suggest that careful consideration in interpretation and downstream analysis need to be made when relying on local neighborhoods and clusters on scRNAseq data. In addition, these results call for extensive benchmarking, to identify and provide methods robust in their local neighborhood relationships on data containing low to high dropout rates.
Collapse
Affiliation(s)
- Alisa Pavel
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - Manja Gersholm Grønberg
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - Line H. Clemmensen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
- Department of Mathematical Sciences, University of Copenhagen, 2100, Copenhagen, Denmark
| |
Collapse
|
15
|
Chen J, Sun Q, Wang C, Gao C. scCCTR: An iterative selection-based semi-supervised clustering model for single-cell RNA-seq data. Comput Struct Biotechnol J 2025; 27:1090-1102. [PMID: 40165824 PMCID: PMC11957811 DOI: 10.1016/j.csbj.2025.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2024] [Revised: 02/28/2025] [Accepted: 03/10/2025] [Indexed: 04/02/2025] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables the analysis of the genome, transcriptome, and epigenome at the single-cell level, providing a critical tool for understanding cellular heterogeneity and diversity. Cell clustering, a key step in scRNA-seq data analysis, reveals population structure by grouping cells with similar expression patterns. However, due to the high dimensionality and sparsity of scRNA-seq data, the performance of existing clustering algorithms remains suboptimal. In this study, we propose a novel clustering algorithm, scCCTR, which performs semi-supervised classification by guiding a deep learning model through iterative selection of high-confidence cells and labels. The algorithm consists of two main components: an iterative selection module and a semi-supervised classification module. In the iterative selection module, scCCTR progressively selects high-confidence cells that exhibit core group features and iteratively optimizes feature representations, constructing a consensus clustering result throughout the iterations. In the semi-supervised classification module, scCCTR uses the selected core data to train a Transformer neural network, which leverages a multi-head attention mechanism to focus on critical information, thereby achieving higher clustering precision. We compared scCCTR with several established cell clustering methods on real datasets, and the results demonstrate that scCCTR outperforms existing methods in terms of accuracy and effectiveness for both cell clustering and visualization. (The code of scCCTR is free available for academic https://github.com/chenjiejie387/scCCTR).
Collapse
Affiliation(s)
- Jie Chen
- School of Computer Science and Technology, Changchun Normal University, Changchun, 130032, China
| | - Qiucheng Sun
- School of Computer Science and Technology, Changchun Normal University, Changchun, 130032, China
| | - Chunyan Wang
- School of Computer Science and Technology, Changchun Normal University, Changchun, 130032, China
| | - Changbo Gao
- School of Computer Science and Technology, Changchun Normal University, Changchun, 130032, China
| |
Collapse
|
16
|
Chevalley M, Roohani YH, Mehrjou A, Leskovec J, Schwab P. A large-scale benchmark for network inference from single-cell perturbation data. Commun Biol 2025; 8:412. [PMID: 40069299 PMCID: PMC11897147 DOI: 10.1038/s42003-025-07764-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 02/18/2025] [Indexed: 03/15/2025] Open
Abstract
Mapping biological mechanisms in cellular systems is a fundamental step in early-stage drug discovery that serves to generate hypotheses on what disease-relevant molecular targets may effectively be modulated by pharmacological interventions. With the advent of high-throughput methods for measuring single-cell gene expression under genetic perturbations, we now have effective means for generating evidence for causal gene-gene interactions at scale. However, evaluating the performance of network inference methods in real-world environments is challenging due to the lack of ground-truth knowledge. Moreover, traditional evaluations conducted on synthetic datasets do not reflect the performance in real-world systems. We thus introduce CausalBench, a benchmark suite revolutionizing network inference evaluation with real-world, large-scale single-cell perturbation data. CausalBench, distinct from existing benchmarks, offers biologically-motivated metrics and distribution-based interventional measures, providing a more realistic evaluation of network inference methods. An initial systematic evaluation of state-of-the-art causal inference methods using our CausalBench suite highlights how poor scalability of existing methods limits performance. Moreover, methods that use interventional information do not outperform those that only use observational data, contrary to what is observed on synthetic benchmarks. CausalBench subsequently enables the development of numerous promising methods through a community challenge, thus demonstrating its potential as a transformative tool in the field of computational biology, bridging the gap between theoretical innovation and practical application in drug discovery and disease understanding. Thus, CausalBench opens new avenues for method developers in causal network inference research, and provides to practitioners a principled and reliable way to track progress in network methods for real-world interventional data.
Collapse
Affiliation(s)
| | - Yusuf H Roohani
- GSK.ai, Zug, Switzerland
- Stanford University, Stanford, CA, USA
| | | | | | | |
Collapse
|
17
|
Zhang Y, Wang Y, Liu X, Feng X. PbImpute: Precise Zero Discrimination and Balanced Imputation in Single-Cell RNA Sequencing Data. J Chem Inf Model 2025; 65:2670-2684. [PMID: 39957720 PMCID: PMC11898086 DOI: 10.1021/acs.jcim.4c02125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 01/31/2025] [Accepted: 02/03/2025] [Indexed: 02/18/2025]
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology for elucidating cellular heterogeneity at unprecedented resolution. However, technical limitations such as limited sequencing depth and mRNA capture efficiency often result in zero counts, commonly referred to as "dropout zeros" in scRNA-seq data. These zeros pose significant challenges to downstream analysis, as they can distort the interpretation of cellular transcriptomes. While numerous computational methods have been developed to address this challenge, existing approaches frequently suffer from either insufficient imputation of zeros (under-imputation) or excessive modification of zeros (over-imputation). Here, we propose a precisely balanced imputation (PbImpute) method designed to achieve optimal equilibrium between dropout recovery and biological zero preservation in scRNA-seq data. PbImpute employs a multistage approach: (1) Initial discrimination between technical dropouts and biological zeros through parameter optimization of a new zero-inflated negative binomial (ZINB) distribution model, followed by initial imputation; (2) Application of a uniquely designed static repair algorithm to enhance data fidelity; (3) Secondary dropout identification based on gene expression frequency and partition-specific coefficient of variation; (4) Graph-embedding neural network-based imputation; and (5) Implementation of a uniquely designed dynamic repair mechanism to mitigate over-imputation effects. PbImpute distinguishes itself by uniquely integrating ZINB modeling with static and dynamic repair. This advantageous combined approach achieves a balance between over- and under-imputation, while simultaneously preserving true biological zeros and reducing signal distortion. Comprehensive evaluation using both simulated and real scRNA-seq data sets demonstrated that PbImpute achieves superior performance (F1 Score = 0.88 at 83% dropout rate, ARI = 0.78 on PBMC) in discriminating between technical dropouts and biological zeros compared to state-of-the-art methods. The method significantly improves gene-gene and cell-cell correlation structures, enhances differential expression analysis sensitivity, optimizes clustering resolution and dimensional reduction visualization, and facilitates more accurate trajectory inference. Ablation studies confirmed the essential contribution of both the imputation and repair modules to the method's performance. The code is available at https://github.com/WyBioTeam/PbImpute. By enhancing the accuracy of scRNA-seq data imputation, PbImpute can improve the identification of cell subpopulations and the detection of differentially expressed genes, thereby facilitating more precise analyses of cellular heterogeneity and advancing disease research.
Collapse
Affiliation(s)
- Yi Zhang
- School
of Computer Science and Engineering, Guilin
University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China
- Guangxi
Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China
| | - Yin Wang
- School
of Computer Science and Engineering, Guilin
University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China
- Guangxi
Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China
| | - Xinyuan Liu
- School
of Computer Science and Engineering, Guilin
University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China
- Guangxi
Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China
| | - Xi Feng
- School
of Computer Science and Engineering, Guilin
University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China
- Guangxi
Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China
| |
Collapse
|
18
|
Liu W, Zhao Z. Scupa: single-cell unified polarization assessment of immune cells using the single-cell foundation model. Bioinformatics 2025; 41:btaf090. [PMID: 39999031 PMCID: PMC11893155 DOI: 10.1093/bioinformatics/btaf090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 01/15/2025] [Accepted: 02/21/2025] [Indexed: 02/27/2025] Open
Abstract
MOTIVATION Immune cells undergo cytokine-driven polarization in response to diverse stimuli, altering their transcriptional profiles and functional states. This dynamic process is central to immune responses in health and diseases, yet a systematic approach to assess cytokine-driven polarization in single-cell RNA sequencing data has been lacking. RESULTS To address this gap, we developed single-cell unified polarization assessment (Scupa), the first computational method for comprehensive immune cell polarization assessment. Scupa leverages data from the Immune Dictionary, which characterizes cytokine-driven polarization states across 14 immune cell types. By integrating cell embeddings from the single-cell foundation model Universal Cell Embeddings, Scupa effectively identifies polarized cells across different species and experimental conditions. Applications of Scupa in independent datasets demonstrated its accuracy in classifying polarized cells and further revealed distinct polarization profiles in tumor-infiltrating myeloid cells across cancers. Scupa complements conventional single-cell data analysis by providing new insights into dynamic immune cell states, and holds potential for advancing therapeutic insights, particularly in cytokine-based therapies. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/bsml320/Scupa.
Collapse
Affiliation(s)
- Wendao Liu
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, United States
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Zhongming Zhao
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, United States
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| |
Collapse
|
19
|
Hui HWH, Chan WX, Goh WWB. Assessing the impact of batch effect associated missing values on downstream analysis in high-throughput biomedical data. Brief Bioinform 2025; 26:bbaf168. [PMID: 40230039 PMCID: PMC12066825 DOI: 10.1093/bib/bbaf168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Revised: 03/10/2025] [Accepted: 03/24/2025] [Indexed: 04/16/2025] Open
Abstract
Batch effect associated missing values (BEAMs) are batch-wide missingness induced from the integration of data with different coverage of biomedical features. BEAMs can present substantial challenges in data analysis. This study investigates how BEAMs impact missing value imputation (MVI) and batch effect (BE) correction algorithms (BECAs). Through simulations and analyses of real-world datasets including the Clinical Proteomic Tumour Analysis Consortium (CPTAC), we evaluated six MVI methods: K-nearest neighbors (KNN), Mean, MinProb, Singular Value Decomposition (SVD), Multivariate Imputation by Chained Equations (MICE), and Random Forest (RF), with ComBat and limma as the BECAs. We demonstrated that BEAMs strongly affect MVI performance, resulting in inaccurate imputed values, inflated significant P-values, and compromised BE correction. KNN, SVD, and RF were particularly prone to propagating random signals, resulting in false statistical confidence. While imputation with Mean and MinProb were less detrimental, artifacts were nonetheless introduced. Furthermore, the detrimental effect of BEAMs increased in parallel with its severity in the data. Our findings highlight the necessity of comprehensive assessments and tailored strategies to handle BEAMs in multi-batch datasets to ensure reliable data analysis and interpretation. Future work should investigate more advanced simulations and a variety of dedicated MVI methods to robustly address BEAMs.
Collapse
Affiliation(s)
- Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
| | - Wei Xin Chan
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
- Center for Artificial Intelligence in Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, Burlington Danes, The Hammersmith Hospital, Du Cane Road, London W12 0NN, United Kingdom
| |
Collapse
|
20
|
Chen X, Ma Y, Shi Y, Zhang B, Wu H, Gao J. Fuzzy-Based Identification of Transition Cells to Infer Cell Trajectory for Single-Cell Transcriptomics. J Comput Biol 2025; 32:253-273. [PMID: 39670822 DOI: 10.1089/cmb.2023.0432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2024] Open
Abstract
With the continuous evolution of single-cell RNA sequencing technology, it has become feasible to reconstruct cell development processes using computational methods. Trajectory inference is a crucial downstream analytical task that provides valuable insights into understanding cell cycle and differentiation. During cell development, cells exhibit both stable and transition states, which makes it challenging to accurately identify these cells. To address this challenge, we propose a novel single-cell trajectory inference method using fuzzy clustering, named scFCTI. By introducing fuzzy clustering and quantifying cell uncertainty, scFCTI can identify transition cells within unstable cell states. Moreover, scFCTI can obtain refined cell classification by characterizing different cell stages, which gain more accurate single-cell trajectory reconstruction containing transition paths. To validate the effectiveness of scFCTI, we conduct experiments on five real datasets and four different structure simulation datasets, comparing them with several state-of-the-art trajectory inference methods. The results demonstrate that scFCTI outperforms these methods by successfully identifying unstable cell clusters and obtaining more accurate cell paths with transition states. Especially the experimental results demonstrate that scFCTI can reconstruct the cell trajectory more precisely.
Collapse
Affiliation(s)
- Xiang Chen
- School of Science, Jiangnan University, Wuxi, China
| | - Yibing Ma
- School of Science, Jiangnan University, Wuxi, China
| | - Yongle Shi
- School of Science, Jiangnan University, Wuxi, China
| | - Bai Zhang
- School of Science, Jiangnan University, Wuxi, China
| | - Hanwen Wu
- School of Science, Jiangnan University, Wuxi, China
| | - Jie Gao
- School of Science, Jiangnan University, Wuxi, China
| |
Collapse
|
21
|
Venkatesan S, Werner JM, Li Y, Gillis J. Cell Type-Agnostic Transcriptomic Signatures Enable Uniform Comparisons of Neurodevelopment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.24.639936. [PMID: 40060479 PMCID: PMC11888278 DOI: 10.1101/2025.02.24.639936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/21/2025]
Abstract
Single-cell transcriptomics has revolutionized our understanding of neurodevelopmental cell identities, yet, predicting a cell type's developmental state from its transcriptome remains a challenge. We perform a meta-analysis of developing human brain datasets comprising over 2.8 million cells, identifying both tissue-level and cell-autonomous predictors of developmental age. While tissue composition predicts age within individual studies, it fails to generalize, whereas specific cell type proportions reliably track developmental time across datasets. Training regularized regression models to infer cell-autonomous maturation, we find that a cell type-agnostic model achieves the highest accuracy (error = 2.6 weeks), robustly capturing developmental dynamics across diverse cell types and datasets. This model generalizes to human neural organoids, accurately predicting normal developmental trajectories (R = 0.91) and disease-induced shifts in vitro. Furthermore, it extends to the developing mouse brain, revealing an accelerated developmental tempo relative to humans. Our work provides a unified framework for comparing neurodevelopment across contexts, model systems, and species.
Collapse
Affiliation(s)
- Sridevi Venkatesan
- Department of Physiology, University of Toronto, Canada
- Developmental and Stem Cell Biology, Hospital for Sick Children, Toronto, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Toronto, Canada
| | - Jonathan M Werner
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Toronto, Canada
| | - Yun Li
- Developmental and Stem Cell Biology, Hospital for Sick Children, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Canada
| | - Jesse Gillis
- Department of Physiology, University of Toronto, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Canada
| |
Collapse
|
22
|
Hao G, Fan Y, Yu Z, Su Y, Zhu H, Wang F, Chen X, Yang Y, Wang G, Wong KC, Li X. Topological identification and interpretation for single-cell epigenetic regulation elucidation in multi-tasks using scAGDE. Nat Commun 2025; 16:1691. [PMID: 39956806 PMCID: PMC11830825 DOI: 10.1038/s41467-025-57027-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 02/03/2025] [Indexed: 02/18/2025] Open
Abstract
Single-cell ATAC-seq technology advances our understanding of single-cell heterogeneity in gene regulation by enabling exploration of epigenetic landscapes and regulatory elements. However, low sequencing depth per cell leads to data sparsity and high dimensionality, limiting the characterization of gene regulatory elements. Here, we develop scAGDE, a single-cell chromatin accessibility model-based deep graph representation learning method that simultaneously learns representation and clustering through explicit modeling of data generation. Our evaluations demonstrated that scAGDE outperforms existing methods in cell segregation, key marker identification, and visualization across diverse datasets while mitigating dropout events and unveiling hidden chromatin-accessible regions. We find that scAGDE preferentially identifies enhancer-like regions and elucidates complex regulatory landscapes, pinpointing putative enhancers regulating the constitutive expression of CTLA4 and the transcriptional dynamics of CD8A in immune cells. When applied to human brain tissue, scAGDE successfully annotated cis-regulatory element-specified cell types and revealed functional diversity and regulatory mechanisms of glutamatergic neurons.
Collapse
Affiliation(s)
- Gaoyang Hao
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yi Fan
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Zhuohan Yu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yanchi Su
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Haoran Zhu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Xingjian Chen
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Yuning Yang
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China.
| |
Collapse
|
23
|
Goss K, Horwitz EM. Single-cell multiomics to advance cell therapy. Cytotherapy 2025; 27:137-145. [PMID: 39530970 DOI: 10.1016/j.jcyt.2024.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/21/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
Single-cell RNA-sequencing (scRNAseq) was first introduced in 2009 and has evolved with many technological advancements over the last decade. Not only are there several scRNAseq platforms differing in many aspects, but there are also a large number of computational pipelines available for downstream analyses which are being developed at an exponential rate. Such computational data appear in many scientific publications in virtually every field of study; thus, investigators should be able to understand and interpret data in this rapidly evolving field. Here, we discuss key differences in scRNAseq platforms, crucial steps in scRNAseq experiments, standard downstream analyses and introduce newly developed multimodal approaches. We then discuss how single-cell omics has been applied to advance the field of cell therapy.
Collapse
Affiliation(s)
- Kyndal Goss
- Marcus Center for Advanced Cellular Therapy, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Aflac Cancer & Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Graduate Division of Biology and Biomedical Sciences, Emory University Laney Graduate School, Atlanta, Georgia, USA
| | - Edwin M Horwitz
- Marcus Center for Advanced Cellular Therapy, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Aflac Cancer & Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, USA; Graduate Division of Biology and Biomedical Sciences, Emory University Laney Graduate School, Atlanta, Georgia, USA.
| |
Collapse
|
24
|
Sarkar H, Lee E, Lopez-Darwin SL, Kang Y. Deciphering normal and cancer stem cell niches by spatial transcriptomics: opportunities and challenges. Genes Dev 2025; 39:64-85. [PMID: 39496456 PMCID: PMC11789490 DOI: 10.1101/gad.351956.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2024]
Abstract
Cancer stem cells (CSCs) often exhibit stem-like attributes that depend on an intricate stemness-promoting cellular ecosystem within their niche. The interplay between CSCs and their niche has been implicated in tumor heterogeneity and therapeutic resistance. Normal stem cells (NSCs) and CSCs share stemness features and common microenvironmental components, displaying significant phenotypic and functional plasticity. Investigating these properties across diverse organs during normal development and tumorigenesis is of paramount research interest and translational potential. Advancements in next-generation sequencing (NGS), single-cell transcriptomics, and spatial transcriptomics have ushered in a new era in cancer research, providing high-resolution and comprehensive molecular maps of diseased tissues. Various spatial technologies, with their unique ability to measure the location and molecular profile of a cell within tissue, have enabled studies on intratumoral architecture and cellular cross-talk within the specific niches. Moreover, delineation of spatial patterns for niche-specific properties such as hypoxia, glucose deprivation, and other microenvironmental remodeling are revealed through multilevel spatial sequencing. This tremendous progress in technology has also been paired with the advent of computational tools to mitigate technology-specific bottlenecks. Here we discuss how different spatial technologies are used to identify NSCs and CSCs, as well as their associated niches. Additionally, by exploring related public data sets, we review the current challenges in characterizing such niches, which are often hindered by technological limitations, and the computational solutions used to address them.
Collapse
Affiliation(s)
- Hirak Sarkar
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA
- Ludwig Institute for Cancer Research Princeton Branch, Princeton, New Jersey 08544, USA
- Department of Computer Science, Princeton, New Jersey 08544, USA
| | - Eunmi Lee
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA
| | - Sereno L Lopez-Darwin
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Yibin Kang
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA;
- Ludwig Institute for Cancer Research Princeton Branch, Princeton, New Jersey 08544, USA
- Cancer Metabolism and Growth Program, Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey 08903, USA
| |
Collapse
|
25
|
Juan W, Ahn KW, Chen YG, Lin CW. CCI: A Consensus Clustering-Based Imputation Method for Addressing Dropout Events in scRNA-Seq Data. Bioengineering (Basel) 2025; 12:31. [PMID: 39851305 PMCID: PMC11763284 DOI: 10.3390/bioengineering12010031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Revised: 12/29/2024] [Accepted: 12/30/2024] [Indexed: 01/26/2025] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a cutting-edge technique in molecular biology and genomics, revealing the cellular heterogeneity. However, scRNA-seq data often suffer from dropout events, meaning that certain genes exhibit very low or even zero expression levels due to technical limitations. Existing imputation methods for dropout events lack comprehensive evaluations in downstream analyses and do not demonstrate robustness across various scenarios. In response to this challenge, we propose a consensus clustering-based imputation (CCI) method. CCI performs clustering on each subset of data sampling across genes and summarizes clustering outcomes to define cellular similarities. CCI leverages the information from similar cells and employs the similarities to impute gene expression levels. Our comprehensive evaluations demonstrate that CCI not only reconstructs the original data pattern, but also improves the performance of downstream analyses. CCI outperforms existing methods for data imputation under different scenarios, exhibiting accuracy, robustness, and generalization.
Collapse
Affiliation(s)
- Wanlin Juan
- Division of Biostatistics, Data Science Institute, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USA; (W.J.); (K.W.A.)
| | - Kwang Woo Ahn
- Division of Biostatistics, Data Science Institute, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USA; (W.J.); (K.W.A.)
| | - Yi-Guang Chen
- Department of Pediatrics, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USA;
| | - Chien-Wei Lin
- Division of Biostatistics, Data Science Institute, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USA; (W.J.); (K.W.A.)
| |
Collapse
|
26
|
Schumann Y, Gocke A, Neumann JE. Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets. Proteomics 2025; 25:e202400100. [PMID: 39740174 DOI: 10.1002/pmic.202400100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 11/08/2024] [Accepted: 11/26/2024] [Indexed: 01/02/2025]
Abstract
Molecular profiling of different omic-modalities (e.g., DNA methylomics, transcriptomics, proteomics) in biological systems represents the basis for research and clinical decision-making. Measurement-specific biases, so-called batch effects, often hinder the integration of independently acquired datasets, and missing values further hamper the applicability of typical data processing algorithms. In addition to careful experimental design, well-defined standards in data acquisition and data exchange, the alleviation of these phenomena particularly requires a dedicated data integration and preprocessing pipeline. This review aims to give a comprehensive overview of computational methods for data integration and missing value imputation for omic data analyses. We provide formal definitions for missing value mechanisms and propose a novel statistical taxonomy for batch effects, especially in the presence of missing data. Based on an automated document search and systematic literature review, we describe 32 distinct data integration methods from five main methodological categories, as well as 37 algorithms for missing value imputation from five separate categories. Additionally, this review highlights multiple quantitative evaluation methods to aid researchers in selecting a suitable set of methods for their work. Finally, this work provides an integrated discussion of the relevance of batch effects and missing values in omics with corresponding method recommendations. We then propose a comprehensive three-step workflow from the study conception to final data analysis and deduce perspectives for future research. Eventually, we present a comprehensive flow chart as well as exemplary decision trees to aid practitioners in the selection of specific approaches for imputation and data integration in their studies.
Collapse
Affiliation(s)
- Yannis Schumann
- IT-Department, Deutsches Elektronen-Synchroton DESY, Hamburg, Germany
| | - Antonia Gocke
- Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Core Facility Mass Spectrometric Proteomics, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Julia E Neumann
- Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Institute of Neuropathology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| |
Collapse
|
27
|
Golchin A, Shams F, Moradi F, Sadrabadi AE, Parviz S, Alipour S, Ranjbarvan P, Hemmati Y, Rahnama M, Rasmi Y, Aziz SGG. Single-cell Technology in Stem Cell Research. Curr Stem Cell Res Ther 2025; 20:9-32. [PMID: 38243989 DOI: 10.2174/011574888x265479231127065541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/23/2023] [Accepted: 10/04/2023] [Indexed: 01/22/2024]
Abstract
Single-cell technology (SCT), which enables the examination of the fundamental units comprising biological organs, tissues, and cells, has emerged as a powerful tool, particularly in the field of biology, with a profound impact on stem cell research. This innovative technology opens new pathways for acquiring cell-specific data and gaining insights into the molecular pathways governing organ function and biology. SCT is not only frequently used to explore rare and diverse cell types, including stem cells, but it also unveils the intricacies of cellular diversity and dynamics. This perspective, crucial for advancing stem cell research, facilitates non-invasive analyses of molecular dynamics and cellular functions over time. Despite numerous investigations into potential stem cell therapies for genetic disorders, degenerative conditions, and severe injuries, the number of approved stem cell-based treatments remains limited. This limitation is attributed to the various heterogeneities present among stem cell sources, hindering their widespread clinical utilization. Furthermore, stem cell research is intimately connected with cutting-edge technologies, such as microfluidic organoids, CRISPR technology, and cell/tissue engineering. Each strategy developed to overcome the constraints of stem cell research has the potential to significantly impact advanced stem cell therapies. Drawing on the advantages and progress achieved through SCT-based approaches, this study aims to provide an overview of the advancements and concepts associated with the utilization of SCT in stem cell research and its related fields.
Collapse
Affiliation(s)
- Ali Golchin
- Cellular and Molecular Research Center, Cellular and Molecular Medicine Institute, Urmia University of Medical Sciences, Urmia, Iran
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Forough Shams
- Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Shahid, Beheshti University of Medical Sciences, Tehran, Iran
| | - Faezeh Moradi
- Department of Tissue Engineering, School of Medicine, Tarbiat Modares University, Tehran, Iran
| | - Amin Ebrahimi Sadrabadi
- Department of Stem Cells and Developmental Biology, Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR , Tehran, Iran
| | - Shima Parviz
- Department of Tissue Engineering and Applied Cell Sciences, School of Advanced Medical Sciences and Technologies, Shiraz, University of Medical Sciences, Shiraz, Iran
| | - Shahriar Alipour
- Cellular and Molecular Research Center, Cellular and Molecular Medicine Institute, Urmia University of Medical Sciences, Urmia, Iran
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Parviz Ranjbarvan
- Cellular and Molecular Research Center, Cellular and Molecular Medicine Institute, Urmia University of Medical Sciences, Urmia, Iran
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Yaser Hemmati
- Department of Prosthodontics, Dental Faculty, Urmia University of Medical Science, Urmia, Iran
| | - Maryam Rahnama
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Yousef Rasmi
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Shiva Gholizadeh-Ghaleh Aziz
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| |
Collapse
|
28
|
Gulati GS, D'Silva JP, Liu Y, Wang L, Newman AM. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics. Nat Rev Mol Cell Biol 2025; 26:11-31. [PMID: 39169166 DOI: 10.1038/s41580-024-00768-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/16/2024] [Indexed: 08/23/2024]
Abstract
Single-cell transcriptomics has broadened our understanding of cellular diversity and gene expression dynamics in healthy and diseased tissues. Recently, spatial transcriptomics has emerged as a tool to contextualize single cells in multicellular neighbourhoods and to identify spatially recurrent phenotypes, or ecotypes. These technologies have generated vast datasets with targeted-transcriptome and whole-transcriptome profiles of hundreds to millions of cells. Such data have provided new insights into developmental hierarchies, cellular plasticity and diverse tissue microenvironments, and spurred a burst of innovation in computational methods for single-cell analysis. In this Review, we discuss recent advancements, ongoing challenges and prospects in identifying and characterizing cell states and multicellular neighbourhoods. We discuss recent progress in sample processing, data integration, identification of subtle cell states, trajectory modelling, deconvolution and spatial analysis. Furthermore, we discuss the increasing application of deep learning, including foundation models, in analysing single-cell and spatial transcriptomics data. Finally, we discuss recent applications of these tools in the fields of stem cell biology, immunology, and tumour biology, and the future of single-cell and spatial transcriptomics in biological research and its translation to the clinic.
Collapse
Affiliation(s)
- Gunsagar S Gulati
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Yunhe Liu
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Linghua Wang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA
| | - Aaron M Newman
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
- Chan Zuckerberg Biohub - San Francisco, San Francisco, CA, USA.
| |
Collapse
|
29
|
Rosoff DB, Wagner J, Bell AS, Mavromatis LA, Jung J, Lohoff FW. A multi-omics Mendelian randomization study identifies new therapeutic targets for alcohol use disorder and problem drinking. Nat Hum Behav 2025; 9:188-207. [PMID: 39528761 DOI: 10.1038/s41562-024-02040-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 10/01/2024] [Indexed: 11/16/2024]
Abstract
Integrating proteomic and transcriptomic data with genetic architectures of problematic alcohol use and alcohol consumption behaviours can advance our understanding and help identify therapeutic targets. We conducted systematic screens using genome-wise association study data from ~3,500 cortical proteins (N = 722) and ~6,100 genes in 8 canonical brain cell types (N = 192) with 4 alcohol-related outcomes (N ≤ 537,349), identifying 217 cortical proteins and 255 cell-type genes associated with these behaviours, with 36 proteins and 37 cell-type genes being new. Although there was limited overlap between proteome and transcriptome targets, downstream neuroimaging revealed shared neurophysiological pathways. Colocalization with independent genome-wise association study data further prioritized 16 proteins, including CAB39L and NRBP1, and 12 cell-type genes, implicating mechanisms such as mTOR signalling. In addition, genes such as SAMHD1, VIPAS39, NUP160 and INO80E were identified as having favourable neuropsychiatric profiles. These findings provide insights into the genetic landscapes governing problematic alcohol use and alcohol consumption behaviours, highlighting promising therapeutic targets for future research.
Collapse
Affiliation(s)
- Daniel B Rosoff
- Section on Clinical Genomics and Experimental Therapeutics, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA
- NIH Oxford-Cambridge Scholars Program, National Institutes of Health, Bethesda, MD, USA
- Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| | - Josephin Wagner
- Section on Clinical Genomics and Experimental Therapeutics, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA
| | - Andrew S Bell
- Section on Clinical Genomics and Experimental Therapeutics, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA
| | - Lucas A Mavromatis
- Section on Clinical Genomics and Experimental Therapeutics, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA
| | - Jeesun Jung
- Section on Clinical Genomics and Experimental Therapeutics, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA
| | - Falk W Lohoff
- Section on Clinical Genomics and Experimental Therapeutics, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
30
|
Song WM, Ming C, Forst CV, Zhang B. Unsupervised multi-scale clustering of single-cell transcriptomes to identify hierarchical structures of cell subtypes. RESEARCH SQUARE 2024:rs.3.rs-5671748. [PMID: 39764102 PMCID: PMC11703337 DOI: 10.21203/rs.3.rs-5671748/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
Cell clustering is an essential step in uncovering cellular architectures in single cell RNA-sequencing (scRNA-seq) data. However, the existing cell clustering approaches are not well designed to dissect complex structures of cellular landscapes at a finer resolution. Here, we develop a multi-scale clustering (MSC) approach to construct sparse cell-cell correlation network for identifying de novo cell types and subtypes at multiscale resolution in an unsupervised manner. Based upon simulated, silver and gold standard data as well as real scRNA-seq data in diseases, MSC showed much improved performance in comparison to established benchmark methods, and identified biologically meaningful cell hierarchy to facilitate the discovery of novel disease associated cell subtypes and mechanisms.
Collapse
Affiliation(s)
- Won-Min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Chen Ming
- Faculty of Health Sciences, University of Macau, Avenida da Universidade, Taipa, Macau, China
| | - Christian V. Forst
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
| |
Collapse
|
31
|
Liu X, Wang H, Gao J. scIALM: A method for sparse scRNA-seq expression matrix imputation using the Inexact Augmented Lagrange Multiplier with low error. Comput Struct Biotechnol J 2024; 23:549-558. [PMID: 38274995 PMCID: PMC10809077 DOI: 10.1016/j.csbj.2023.12.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 01/27/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a high-throughput sequencing technology that quantifies gene expression profiles of specific cell populations at the single-cell level, providing a foundation for studying cellular heterogeneity and patient pathological characteristics. It is effective for developmental, fertility, and disease studies. However, the cell-gene expression matrix of single-cell sequencing data is often sparse and contains numerous zero values. Some of the zero values derive from noise, where dropout noise has a large impact on downstream analysis. In this paper, we propose a method named scIALM for imputation recovery of sparse single-cell RNA data expression matrices, which employs the Inexact Augmented Lagrange Multiplier method to use sparse but clean (accurate) data to recover unknown entries in the matrix. We perform experimental analysis on four datasets, calling the expression matrix after Quality Control (QC) as the original matrix, and comparing the performance of scIALM with six other methods using mean squared error (MSE), mean absolute error (MAE), Pearson correlation coefficient (PCC), and cosine similarity (CS). Our results demonstrate that scIALM accurately recovers the original data of the matrix with an error of 10e-4, and the mean value of the four metrics reaches 4.5072 (MSE), 0.765 (MAE), 0.8701 (PCC), 0.8896 (CS). In addition, at 10%-50% random masking noise, scIALM is the least sensitive to the masking ratio. For downstream analysis, this study uses adjusted rand index (ARI) and normalized mutual information (NMI) to evaluate the clustering effect, and the results are improved on three datasets containing real cluster labels.
Collapse
Affiliation(s)
- Xiaohong Liu
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Han Wang
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Jingyang Gao
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| |
Collapse
|
32
|
Bellavance J, David LS, Hildebrand ME. An Open-Source Tool for Investigation of Differential RNA Expression Between Spinal Cord Cells of Male and Female Mice. J Neurosci Res 2024; 102:e70008. [PMID: 39673257 PMCID: PMC11645520 DOI: 10.1002/jnr.70008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 11/20/2024] [Accepted: 11/25/2024] [Indexed: 12/16/2024]
Abstract
Chronic pain is a highly debilitating condition that differs by type, prevalence, and severity between men and women. To uncover the molecular underpinnings of these differences, it is critical to analyze the transcriptomes of spinal cord pain-processing networks for both sexes. Despite several recently published single-nucleus RNA-sequencing (snRNA-seq) studies on the function and composition of the mouse spinal cord, a gene expression analysis investigating the differences between males and females has yet to be performed. Here, we combined data from three different large-scale snRNA-seq studies, which used sex-identified adult mice. Using SeqSeek, we classified more than 37,000 unique viable cells within predicted cell types with the use of machine learning. We then utilized DESeq2 to identify significant differentially expressed genes (DEGs) between males and females in a variety of cell populations, including superficial dorsal horn (SDH) neurons. We found a large number of DEGs between males and females in all cells, in neurons, and in SDH neurons of the mouse spinal cord, with a greater level of differential expression in inhibitory SDH neurons compared to excitatory SDH neurons. The results of these analyses are available on an open-source web-app: https://justinbellavance.shinyapps.io/snRNA_Visualization/. Lastly, we used gene set enrichment analysis to identify sex-enriched pathways from our previously identified DEGs. Through this, we have identified specific genetic players within the rodent spinal cord that diverge between males and females, which may underlie reported sex differences in spinal nociceptive mechanisms and pain processing.
Collapse
Affiliation(s)
- Justin Bellavance
- Department of NeuroscienceCarleton UniversityOttawaOntarioCanada
- Department of MedicineUniversité de MontréalMontrealQuebecCanada
| | | | - Michael E. Hildebrand
- Department of NeuroscienceCarleton UniversityOttawaOntarioCanada
- Neuroscience ProgramOttawa Hospital Research InstituteOttawaOntarioCanada
| |
Collapse
|
33
|
Zhang Z, Mathew D, Lim TL, Mason K, Martinez CM, Huang S, Wherry EJ, Susztak K, Minn AJ, Ma Z, Zhang NR. Recovery of biological signals lost in single-cell batch integration with CellANOVA. Nat Biotechnol 2024:10.1038/s41587-024-02463-1. [PMID: 39592777 DOI: 10.1038/s41587-024-02463-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 10/02/2024] [Indexed: 11/28/2024]
Abstract
Data integration to align cells across batches has become a cornerstone of single-cell data analysis, critically affecting downstream results. Currently, there are no guidelines for when the biological differences between samples are separable from batch effects. Here we show that current paradigms for single-cell data integration remove biologically meaningful variation and introduce distortion. We present a statistical model and computationally scalable algorithm, CellANOVA (cell state space analysis of variance), that harnesses experimental design to explicitly recover biological signals that are erased during single-cell data integration. CellANOVA uses a 'pool-of-controls' design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest and allow the recovery of subtle biological signals. We apply CellANOVA to diverse contexts and validate the recovered biological signals by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nucleus data integration, where it recovers subtle biological signals that can be validated and replicated by external data.
Collapse
Affiliation(s)
- Zhaojun Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Divij Mathew
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Tristan L Lim
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kaishu Mason
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Clara Morral Martinez
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sijia Huang
- Penn Institute of Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - E John Wherry
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Katalin Susztak
- Renal, Electrolyte and Hypertension Division, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Andy J Minn
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Zongming Ma
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| | - Nancy R Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
34
|
Sharifitabar M, Kazempour S, Razavian J, Sajedi S, Solhjoo S, Zare H. A deep neural network to de-noise single-cell RNA sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.20.624552. [PMID: 39605470 PMCID: PMC11601639 DOI: 10.1101/2024.11.20.624552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq), a powerful technique for investigating the transcriptome of individual cells, enables the discovery of heterogeneous cell populations, rare cell types, and transcriptional dynamics in separate cells. Yet, scRNA-seq data analysis is limited by the problem of measurement dropouts, i.e., genes displaying zero expression levels. We introduce ZiPo, a deep artificial neural network for rate estimation and library size prediction in scRNA-seq data which incorporates adjustable zero inflation in the distribution to capture the dropouts. ZiPo builds upon established concepts, including using deep autoencoders and adopting the Poisson and negative binomial distributions, by taking advantage of novel strategies, including library size prediction and residual connections, to improve the overall performance. A significant innovation of ZiPo is the introduction of a scale-invariant loss term, making the weights sparse and, hence, the model biologically more interpretable. ZiPo quickly handles vast singular and mixed datasets, with the processing time directly proportional to the number of cells. In this paper, we demonstrate the power of ZiPo on three datasets and show its advantages over other current techniques. The code used to produce the results in this manuscript is available at https://bitbucket.org/habilzare/alzheimer/src/master/code/deep/ZiPo/.
Collapse
|
35
|
Stockinger AW, Adelmann L, Fahrenberger M, Ruta C, Özpolat BD, Milivojev N, Balavoine G, Raible F. Molecular profiles, sources and lineage restrictions of stem cells in an annelid regeneration model. Nat Commun 2024; 15:9882. [PMID: 39557833 PMCID: PMC11574210 DOI: 10.1038/s41467-024-54041-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 10/30/2024] [Indexed: 11/20/2024] Open
Abstract
Regeneration of missing body parts can be observed in diverse animal phyla, but it remains unclear to which extent these capacities rely on shared or divergent principles. Research into this question requires detailed knowledge about the involved molecular and cellular principles in suitable reference models. By combining single-cell RNA sequencing and mosaic transgenesis in the marine annelid Platynereis dumerilii, we map cellular profiles and lineage restrictions during posterior regeneration. Our data reveal cell-type specific injury responses, re-expression of positional identity factors, and the re-emergence of stem cell signatures in multiple cell populations. Epidermis and mesodermal coelomic tissue produce distinct putative posterior stem cells (PSCs) in the emerging blastema. A novel mosaic transgenesis strategy reveals both developmental compartments and lineage restrictions during regenerative growth. Our work supports the notion that posterior regeneration involves dedifferentiation, and reveals molecular and mechanistic parallels between annelid and vertebrate regeneration.
Collapse
Affiliation(s)
- Alexander W Stockinger
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, Austria
- University of Vienna, Center for Molecular Biology, Department of Genetics and Microbiology, Vienna, Austria
- Research Platform Single-Cell Regulation of Stem Cells (SinCeReSt), University of Vienna, Vienna, Austria
- Vienna Biocenter PhD Program, a Doctoral School of the University of Vienna and the Medical University of Vienna, Vienna, Austria
- PhD Programme Stem Cells, Tissues, Organoids - Dissecting Regulators of Potency and Pattern Formation (SCORPION), University of Vienna, Vienna, Austria
| | - Leonie Adelmann
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, Austria
- University of Vienna, Center for Molecular Biology, Department of Genetics and Microbiology, Vienna, Austria
- Research Platform Single-Cell Regulation of Stem Cells (SinCeReSt), University of Vienna, Vienna, Austria
- Vienna Biocenter PhD Program, a Doctoral School of the University of Vienna and the Medical University of Vienna, Vienna, Austria
- PhD Programme Stem Cells, Tissues, Organoids - Dissecting Regulators of Potency and Pattern Formation (SCORPION), University of Vienna, Vienna, Austria
| | - Martin Fahrenberger
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, Austria
- Research Platform Single-Cell Regulation of Stem Cells (SinCeReSt), University of Vienna, Vienna, Austria
- Vienna Biocenter PhD Program, a Doctoral School of the University of Vienna and the Medical University of Vienna, Vienna, Austria
- Center for Integrative Bioinformatics Vienna (CIBIV), University of Vienna and Medical University of Vienna, Vienna, Austria
- Medical University of Vienna, Max Perutz Labs, Vienna, Austria
| | - Christine Ruta
- Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - B Duygu Özpolat
- Université de Paris Cité, CNRS, Institut Jacques Monod, Paris, France
- Department of Biology, Washington University in Saint Louis, St. Louis, MO, USA
| | - Nadja Milivojev
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, Austria
- University of Vienna, Center for Molecular Biology, Department of Genetics and Microbiology, Vienna, Austria
- Research Platform Single-Cell Regulation of Stem Cells (SinCeReSt), University of Vienna, Vienna, Austria
- Vienna Biocenter PhD Program, a Doctoral School of the University of Vienna and the Medical University of Vienna, Vienna, Austria
- PhD Programme Stem Cells, Tissues, Organoids - Dissecting Regulators of Potency and Pattern Formation (SCORPION), University of Vienna, Vienna, Austria
| | - Guillaume Balavoine
- Université de Paris Cité, CNRS, Institut Jacques Monod, Paris, France.
- Institute of Neuroscience, CNRS, Université Paris-Saclay, Saclay, France.
| | - Florian Raible
- Max Perutz Labs, Vienna Biocenter Campus (VBC), Vienna, Austria.
- University of Vienna, Center for Molecular Biology, Department of Genetics and Microbiology, Vienna, Austria.
- Research Platform Single-Cell Regulation of Stem Cells (SinCeReSt), University of Vienna, Vienna, Austria.
| |
Collapse
|
36
|
Choudhuri S, Ghosh B. Computational approach for decoding malaria drug targets from single-cell transcriptomics and finding potential drug molecule. Sci Rep 2024; 14:24064. [PMID: 39402081 PMCID: PMC11473826 DOI: 10.1038/s41598-024-72427-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 09/06/2024] [Indexed: 10/17/2024] Open
Abstract
Malaria is a deadly disease caused by Plasmodium parasites. While potent drugs are available in the market for malaria treatment, over the years, Plasmodium parasites have successfully developed resistance against many, if not all, front-line drugs. This poses a serious threat to global malaria eradication efforts, and the continued discovery of new drugs is necessary to tackle this debilitating disease. With recent unprecedented progress in machine learning techniques, single-cell transcriptomic in Plasmodium offers a powerful tool for identifying crucial proteins as a drug target and subsequent computational prediction of potential drugs. In this study, We have implemented a mutual-information-based feature reduction algorithm with a classification algorithm to select important proteins from transcriptomic datasets (sexual and asexual stages) for Plasmodium falciparum and then constructed the protein-protein interaction (PPI) networks of the proteins. The analysis of this PPI network revealed key proteins vital for the survival of Plasmodium falciparum. Based on the function and identification of a few strong binding sites on a couple of these key proteins, we computationally predicted a set of potential drug molecules using a deep learning-based technique. Lead drug molecules that satisfy ADMET and drug-likeliness properties are finally reported out of the generated drugs. The study offers a general computational pipeline to identify crucial proteins using scRNA-seq data sets and further development of potential new drugs.
Collapse
Affiliation(s)
- Soham Choudhuri
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Bhaswar Ghosh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
37
|
Chafamo D, Shanmugam V, Tokcan N. C-ziptf: stable tensor factorization for zero-inflated multi-dimensional genomics data. BMC Bioinformatics 2024; 25:323. [PMID: 39369208 PMCID: PMC11456250 DOI: 10.1186/s12859-024-05886-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 07/30/2024] [Indexed: 10/07/2024] Open
Abstract
In the past two decades, genomics has advanced significantly, with single-cell RNA-sequencing (scRNA-seq) marking a pivotal milestone. ScRNA-seq provides unparalleled insights into cellular diversity and has spurred diverse studies across multiple conditions and samples, resulting in an influx of complex multidimensional genomics data. This highlights the need for robust methodologies capable of handling the complexity and multidimensionality of such genomics data. Furthermore, single-cell data grapples with sparsity due to issues like low capture efficiency and dropout effects. Tensor factorizations (TF) have emerged as powerful tools to unravel the complex patterns from multi-dimensional genomics data. Classic TF methods, based on maximum likelihood estimation, struggle with zero-inflated count data, while the inherent stochasticity in TFs further complicates result interpretation and reproducibility. Our paper introduces Zero Inflated Poisson Tensor Factorization (ZIPTF), a novel method for high-dimensional zero-inflated count data factorization. We also present Consensus-ZIPTF (C-ZIPTF), merging ZIPTF with a consensus-based approach to address stochasticity. We evaluate our proposed methods on synthetic zero-inflated count data, simulated scRNA-seq data, and real multi-sample multi-condition scRNA-seq datasets. ZIPTF consistently outperforms baseline matrix and tensor factorization methods, displaying enhanced reconstruction accuracy for zero-inflated data. When dealing with high probabilities of excess zeros, ZIPTF achieves up to 2.4 × better accuracy. Moreover, C-ZIPTF notably enhances the factorization's consistency. When tested on synthetic and real scRNA-seq data, ZIPTF and C-ZIPTF consistently uncover known and biologically meaningful gene expression programs. Access our data and code at: https://github.com/klarman-cell-observatory/scBTF and https://github.com/klarman-cell-observatory/scbtf_experiments .
Collapse
Affiliation(s)
- Daniel Chafamo
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Vignesh Shanmugam
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, 02215, USA
| | - Neriman Tokcan
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Department of Mathematics, University of Massachusetts Boston, Boston, MA, 02125, USA.
| |
Collapse
|
38
|
Cui S, Nassiri S, Zakeri I. Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection. PLoS Comput Biol 2024; 20:e1012560. [PMID: 39466833 PMCID: PMC11542852 DOI: 10.1371/journal.pcbi.1012560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 11/07/2024] [Accepted: 10/15/2024] [Indexed: 10/30/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) data analysis faces numerous challenges, including high sparsity, a high-dimensional feature space, and biological noise. These challenges hinder downstream analysis, necessitating the use of feature selection methods to identify informative genes, and reduce data dimensionality. However, existing methods for selecting highly variable genes (HVGs) exhibit limited overlap and inconsistent clustering performance across benchmark datasets. Moreover, these methods often struggle to accurately select HVGs from fine-resolution scRNA-seq datasets and minority cell types, which are more difficult to distinguish, raising concerns about the reliability of their results. To overcome these limitations, we propose a novel feature selection framework for scRNA-seq data called Mcadet. Mcadet integrates Multiple Correspondence Analysis (MCA), graph-based community detection, and a novel statistical testing approach. To assess the effectiveness of Mcadet, we conducted extensive evaluations using both simulated and real-world data, employing unbiased metrics for comparison. Our results demonstrate the superior performance of Mcadet in the selection of HVGs in scenarios involving fine-resolution scRNA-seq datasets and datasets containing minority cell populations. Overall, we demonstrate that Mcadet enhances the reliability of selected HVGs, although the impact of HVG selection on various downstream analyses varies and needs to be further investigated.
Collapse
Affiliation(s)
- Saishi Cui
- Department of Epidemiology and Biostatistics, Dornsife School of Public Health, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Sina Nassiri
- Roche Pharma Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
| | - Issa Zakeri
- Department of Epidemiology and Biostatistics, Dornsife School of Public Health, Drexel University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
39
|
Altman JE, Olex AL, Zboril EK, Walker CJ, Boyd DC, Myrick RK, Hairr NS, Koblinski JE, Puchalapalli M, Hu B, Dozmorov MG, Chen XS, Chen Y, Perou CM, Lehmann BD, Visvader JE, Harrell JC. Single-cell transcriptional atlas of human breast cancers and model systems. Clin Transl Med 2024; 14:e70044. [PMID: 39417215 PMCID: PMC11483560 DOI: 10.1002/ctm2.70044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 09/12/2024] [Accepted: 09/21/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Breast cancer's complex transcriptional landscape requires an improved understanding of cellular diversity to identify effective treatments. The study of genetic variations among breast cancer subtypes at single-cell resolution has potential to deepen our insights into cancer progression. METHODS In this study, we amalgamate single-cell RNA sequencing data from patient tumours and matched lymph metastasis, reduction mammoplasties, breast cancer patient-derived xenografts (PDXs), PDX-derived organoids (PDXOs), and cell lines resulting in a diverse dataset of 117 samples with 506 719 total cells. These samples encompass hormone receptor positive (HR+), human epidermal growth factor receptor 2 positive (HER2+), and triple-negative breast cancer (TNBC) subtypes, including isogenic model pairs. Herein, we delineated similarities and distinctions across models and patient samples and explore therapeutic drug efficacy based on subtype proportions. RESULTS PDX models more closely resemble patient samples in terms of tumour heterogeneity and cell cycle characteristics when compared with TNBC cell lines. Acquired drug resistance was associated with an increase in basal-like cell proportions within TNBC PDX tumours as defined with SCSubtype and TNBCtype cell typing predictors. All patient samples contained a mixture of subtypes; compared to primary tumours HR+ lymph node metastases had lower proportions of HER2-Enriched cells. PDXOs exhibited differences in metabolic-related transcripts compared to PDX tumours. Correlative analyses of cytotoxic drugs on PDX cells identified therapeutic efficacy was based on subtype proportion. CONCLUSIONS We present a substantial multimodel dataset, a dynamic approach to cell-wise sample annotation, and a comprehensive interrogation of models within systems of human breast cancer. This analysis and reference will facilitate informed decision-making in preclinical research and therapeutic development through its elucidation of model limitations, subtype-specific insights and novel targetable pathways. KEY POINTS Patient-derived xenografts models more closely resemble patient samples in tumour heterogeneity and cell cycle characteristics when compared with cell lines. 3D organoid models exhibit differences in metabolic profiles compared to their in vivo counterparts. A valuable multimodel reference dataset that can be useful in elucidating model differences and novel targetable pathways.
Collapse
Affiliation(s)
- Julia E. Altman
- Department of Human and Molecular GeneticsVirginia Commonwealth UniversityRichmondVirginiaUSA
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Amy L. Olex
- C. Kenneth and Diane Wright Center for Clinical and Translational ResearchVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Emily K. Zboril
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
- Department of BiochemistryVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Carson J. Walker
- Department of Human and Molecular GeneticsVirginia Commonwealth UniversityRichmondVirginiaUSA
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - David C. Boyd
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Rachel K. Myrick
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Nicole S. Hairr
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Jennifer E. Koblinski
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
- Massey Comprehensive Cancer CenterVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Madhavi Puchalapalli
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
- Massey Comprehensive Cancer CenterVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Bin Hu
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
- Massey Comprehensive Cancer CenterVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Mikhail G. Dozmorov
- Department of BiostatisticsVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - X. Steven Chen
- Department of Public Health SciencesUniversity of Miami Miller School of MedicineMiamiFloridaUSA
- Sylvester Comprehensive Cancer CenterUniversity of Miami Miller School of MedicineMiamiFloridaUSA
| | - Yunshun Chen
- Walter and Eliza Hall Institute of Medical ResearchMelbourneVictoriaAustralia
- Department of Medical BiologyUniversity of MelbourneParkvilleVictoriaAustralia
| | - Charles M. Perou
- Lineberger Comprehensive Cancer CenterUniversity of North CarolinaChapel HillNorth CarolinaUSA
| | - Brian D. Lehmann
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Jane E. Visvader
- Walter and Eliza Hall Institute of Medical ResearchMelbourneVictoriaAustralia
- Department of Medical BiologyUniversity of MelbourneParkvilleVictoriaAustralia
| | - J. Chuck Harrell
- Department of PathologyVirginia Commonwealth UniversityRichmondVirginiaUSA
- Massey Comprehensive Cancer CenterVirginia Commonwealth UniversityRichmondVirginiaUSA
- Center for Pharmaceutical EngineeringVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
40
|
Liu W, Zhao Z. Scupa: Single-cell unified polarization assessment of immune cells using the single-cell foundation model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.15.608093. [PMID: 39229048 PMCID: PMC11370394 DOI: 10.1101/2024.08.15.608093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Immune cells undergo cytokine-driven polarization in respond to diverse stimuli. This process significantly modulates their transcriptional profiles and functional states. Although single-cell RNA sequencing (scRNA-seq) has advanced our understanding of immune responses across various diseases or conditions, currently there lacks a method to systematically examine cytokine effects and immune cell polarization. To address this gap, we developed Single-cell unified polarization assessment (Scupa), the first computational method for comprehensive immune cell polarization analysis. Scupa is trained on data from the Immune Dictionary, which characterizes 66 cytokine-driven polarization states across 14 immune cell types. By leveraging the cell embeddings from the Universal Cell Embeddings model, Scupa effectively identifies polarized cells in new datasets generated from different species and experimental conditions. Applications of Scupa in independent datasets demonstrated its accuracy in classifying polarized cells and further revealed distinct polarization profiles in tumor-infiltrating myeloid cells across cancers. Scupa complements conventional single-cell data analysis by providing new insights into immune cell polarization, and it holds promise for assessing molecular effects or identifying therapeutic targets in cytokine-based therapies.
Collapse
Affiliation(s)
- Wendao Liu
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Zhongming Zhao
- The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
41
|
Piana D, Iavarone F, De Paolis E, Daniele G, Parisella F, Minucci A, Greco V, Urbani A. Phenotyping Tumor Heterogeneity through Proteogenomics: Study Models and Challenges. Int J Mol Sci 2024; 25:8830. [PMID: 39201516 PMCID: PMC11354793 DOI: 10.3390/ijms25168830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 07/31/2024] [Accepted: 08/06/2024] [Indexed: 09/02/2024] Open
Abstract
Tumor heterogeneity refers to the diversity observed among tumor cells: both between different tumors (inter-tumor heterogeneity) and within a single tumor (intra-tumor heterogeneity). These cells can display distinct morphological and phenotypic characteristics, including variations in cellular morphology, metastatic potential and variability treatment responses among patients. Therefore, a comprehensive understanding of such heterogeneity is necessary for deciphering tumor-specific mechanisms that may be diagnostically and therapeutically valuable. Innovative and multidisciplinary approaches are needed to understand this complex feature. In this context, proteogenomics has been emerging as a significant resource for integrating omics fields such as genomics and proteomics. By combining data obtained from both Next-Generation Sequencing (NGS) technologies and mass spectrometry (MS) analyses, proteogenomics aims to provide a comprehensive view of tumor heterogeneity. This approach reveals molecular alterations and phenotypic features related to tumor subtypes, potentially identifying therapeutic biomarkers. Many achievements have been made; however, despite continuous advances in proteogenomics-based methodologies, several challenges remain: in particular the limitations in sensitivity and specificity and the lack of optimal study models. This review highlights the impact of proteogenomics on characterizing tumor phenotypes, focusing on the critical challenges and current limitations of its use in different clinical and preclinical models for tumor phenotypic characterization.
Collapse
Affiliation(s)
- Diletta Piana
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
| | - Federica Iavarone
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
| | - Elisa De Paolis
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
- Departmental Unit of Molecular and Genomic Diagnostics, Genomics Core Facility, Gemelli Science and Technology Park (G-STeP), Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
| | - Gennaro Daniele
- Phase 1 Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy;
| | - Federico Parisella
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
| | - Angelo Minucci
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
- Departmental Unit of Molecular and Genomic Diagnostics, Genomics Core Facility, Gemelli Science and Technology Park (G-STeP), Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
| | - Viviana Greco
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
| | - Andrea Urbani
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
| |
Collapse
|
42
|
Luo Q, Chen Y, Lan X. COMSE: analysis of single-cell RNA-seq data using community detection-based feature selection. BMC Biol 2024; 22:167. [PMID: 39113021 PMCID: PMC11304914 DOI: 10.1186/s12915-024-01963-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing enables studying cells individually, yet high gene dimensions and low cell numbers challenge analysis. And only a subset of the genes detected are involved in the biological processes underlying cell-type specific functions. RESULT In this study, we present COMSE, an unsupervised feature selection framework using community detection to capture informative genes from scRNA-seq data. COMSE identified homogenous cell substates with high resolution, as demonstrated by distinguishing different cell cycle stages. Evaluations based on real and simulated scRNA-seq datasets showed COMSE outperformed methods even with high dropout rates in cell clustering assignment. We also demonstrate that by identifying communities of genes associated with batch effects, COMSE parses signals reflecting biological difference from noise arising due to differences in sequencing protocols, thereby enabling integrated analysis of scRNA-seq datasets of different sources. CONCLUSIONS COMSE provides an efficient unsupervised framework that selects highly informative genes in scRNA-seq data improving cell sub-states identification and cell clustering. It identifies gene subsets that reveal biological and technical heterogeneity, supporting applications like batch effect correction and pathway analysis. It also provides robust results for bulk RNA-seq data analysis.
Collapse
Affiliation(s)
- Qinhuan Luo
- Department of Basic Medical Science, School of Medicine, Tsinghua University, Beijing, 100084, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, 100084, China
| | - Yaozhu Chen
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
| | - Xun Lan
- Department of Basic Medical Science, School of Medicine, Tsinghua University, Beijing, 100084, China.
- Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing, 100084, China.
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
43
|
Wang H, He K, Zhang H, Zhang Q, Cao L, Li J, Zhong Z, Chen H, Zhou L, Lian C, Wang M, Chen K, Qian PY, Li C. Deciphering deep-sea chemosynthetic symbiosis by single-nucleus RNA-sequencing. eLife 2024; 12:RP88294. [PMID: 39102287 PMCID: PMC11299980 DOI: 10.7554/elife.88294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2024] Open
Abstract
Bathymodioline mussels dominate deep-sea methane seep and hydrothermal vent habitats and obtain nutrients and energy primarily through chemosynthetic endosymbiotic bacteria in the bacteriocytes of their gill. However, the molecular mechanisms that orchestrate mussel host-symbiont interactions remain unclear. Here, we constructed a comprehensive cell atlas of the gill in the mussel Gigantidas platifrons from the South China Sea methane seeps (1100 m depth) using single-nucleus RNA-sequencing (snRNA-seq) and whole-mount in situ hybridisation. We identified 13 types of cells, including three previously unknown ones, and uncovered unknown tissue heterogeneity. Every cell type has a designated function in supporting the gill's structure and function, creating an optimal environment for chemosynthesis, and effectively acquiring nutrients from the endosymbiotic bacteria. Analysis of snRNA-seq of in situ transplanted mussels clearly showed the shifts in cell state in response to environmental oscillations. Our findings provide insight into the principles of host-symbiont interaction and the bivalves' environmental adaption mechanisms.
Collapse
Affiliation(s)
- Hao Wang
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Laoshan LaboratoryQingdaoChina
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou)GuangzhouChina
- Department of Ocean Science, Hong Kong University of Science and TechnologyHong KongChina
| | - Kai He
- Key Laboratory of Conservation and Application in Biodiversity of South China, School of Life Sciences, Guangzhou UniversityGuangzhouChina
| | - Huan Zhang
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
| | - Quanyong Zhang
- State Key Laboratory of Primate Biomedical Research, Institute of Primate Translational Medicine, Kunming University of Science and TechnologyKunmingJapan
| | - Lei Cao
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
| | - Jing Li
- South China Sea Institute of Oceanology, Chinese Academy of SciencesGuangzhouChina
| | - Zhaoshan Zhong
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
| | - Hao Chen
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
| | - Li Zhou
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
| | - Chao Lian
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
| | - Minxiao Wang
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
| | - Kai Chen
- State Key Laboratory of Primate Biomedical Research, Institute of Primate Translational Medicine, Kunming University of Science and TechnologyKunmingJapan
| | - Pei-Yuan Qian
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou)GuangzhouChina
- Department of Ocean Science, Hong Kong University of Science and TechnologyHong KongChina
| | - Chaolun Li
- Center of Deep-Sea Research, Institute of Oceanology, Chinese Academy of SciencesQingdaoChina
- South China Sea Institute of Oceanology, Chinese Academy of SciencesGuangzhouChina
- University of Chinese Academy of SciencesBeijingChina
| |
Collapse
|
44
|
Jia H, Wang W, Zhou Z, Chen Z, Lan Z, Bo H, Fan L. Single-cell RNA sequencing technology in human spermatogenesis: Progresses and perspectives. Mol Cell Biochem 2024; 479:2017-2033. [PMID: 37659974 DOI: 10.1007/s11010-023-04840-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 08/14/2023] [Indexed: 09/04/2023]
Abstract
Spermatogenesis, a key part of the spermiation process, is regulated by a combination of key cells, such as primordial germ cells, spermatogonial stem cells, and somatic cells, such as Sertoli cells. Abnormal spermatogenesis can lead to azoospermia, testicular tumors, and other diseases related to male infertility. The application of single-cell RNA sequencing (scRNA-seq) technology in male reproduction is gradually increasing with its unique insight into deep mining and analysis. The data cover different periods of neonatal, prepubertal, pubertal, and adult stages. Different types of male infertility diseases including obstructive and non-obstructive azoospermia (NOA), Klinefelter Syndrome (KS), Sertoli Cell Only Syndrome (SCOS), and testicular tumors are also covered. We briefly review the principles and application of scRNA-seq and summarize the research results and application directions in spermatogenesis in different periods and pathological states. Moreover, we discuss the challenges of applying this technology in male reproduction and the prospects of combining it with other technologies.
Collapse
Affiliation(s)
- Hanbo Jia
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, Hunan, China
| | - Wei Wang
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, Hunan, China
| | - Zhaowen Zhou
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, Hunan, China
| | - Zhiyi Chen
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, Hunan, China
| | - Zijun Lan
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, Hunan, China
| | - Hao Bo
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, Hunan, China.
- Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, Hunan, China.
| | - Liqing Fan
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, Hunan, China.
- Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, Hunan, China.
| |
Collapse
|
45
|
Guo B, Ling W, Kwon SH, Panwar P, Ghazanfar S, Martinowich K, Hicks SC. Integrating spatially-resolved transcriptomics data across tissues and individuals: challenges and opportunities. ARXIV 2024:arXiv:2408.00367v1. [PMID: 39130195 PMCID: PMC11312629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Advances in spatially-resolved transcriptomics (SRT) technologies have propelled the development of new computational analysis methods to unlock biological insights. As the cost of generating these data decreases, these technologies provide an exciting opportunity to create large-scale atlases that integrate SRT data across multiple tissues, individuals, species, or phenotypes to perform population-level analyses. Here, we describe unique challenges of varying spatial resolutions in SRT data, as well as highlight the opportunities for standardized preprocessing methods along with computational algorithms amenable to atlas-scale datasets leading to improved sensitivity and reproducibility in the future.
Collapse
Affiliation(s)
- Boyi Guo
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Wodan Ling
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, NY, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Biochemistry, Cellular, and Molecular Biology Graduate Program, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Pratibha Panwar
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, NSW 2006, Australia
| | - Shila Ghazanfar
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, NSW 2006, Australia
| | - Keri Martinowich
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Johns Hopkins Kavli Neuroscience Discovery Institute, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
46
|
Wang J, Fonseca GJ, Ding J. scSemiProfiler: Advancing large-scale single-cell studies through semi-profiling with deep generative models and active learning. Nat Commun 2024; 15:5989. [PMID: 39013867 PMCID: PMC11252419 DOI: 10.1038/s41467-024-50150-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 06/28/2024] [Indexed: 07/18/2024] Open
Abstract
Single-cell sequencing is a crucial tool for dissecting the cellular intricacies of complex diseases. Its prohibitive cost, however, hampers its application in expansive biomedical studies. Traditional cellular deconvolution approaches can infer cell type proportions from more affordable bulk sequencing data, yet they fall short in providing the detailed resolution required for single-cell-level analyses. To overcome this challenge, we introduce "scSemiProfiler", an innovative computational framework that marries deep generative models with active learning strategies. This method adeptly infers single-cell profiles across large cohorts by fusing bulk sequencing data with targeted single-cell sequencing from a few rigorously chosen representatives. Extensive validation across heterogeneous datasets verifies the precision of our semi-profiling approach, aligning closely with true single-cell profiling data and empowering refined cellular analyses. Originally developed for extensive disease cohorts, "scSemiProfiler" is adaptable for broad applications. It provides a scalable, cost-effective solution for single-cell profiling, facilitating in-depth cellular investigation in various biological domains.
Collapse
Affiliation(s)
- Jingtao Wang
- Meakins-Christe Laboratories, Research Institute of McGill University Health Centre, 1001 Decarie Blvd, Montreal, H4A 3J1, Quebec, Canada
- Department of Medicine, Division of Experimental Medicine, McGill University, 1001 Decarie Blvd, Montreal, H4A 3J1, Quebec, Canada
| | - Gregory J Fonseca
- Meakins-Christe Laboratories, Research Institute of McGill University Health Centre, 1001 Decarie Blvd, Montreal, H4A 3J1, Quebec, Canada
- Department of Medicine, Division of Experimental Medicine, McGill University, 1001 Decarie Blvd, Montreal, H4A 3J1, Quebec, Canada
- Quantitative Life Sciences, McGill University, 845 Rue Sherbrooke Ouest, Montreal, H3A 0G4, Quebec, Canada
| | - Jun Ding
- Meakins-Christe Laboratories, Research Institute of McGill University Health Centre, 1001 Decarie Blvd, Montreal, H4A 3J1, Quebec, Canada.
- Department of Medicine, Division of Experimental Medicine, McGill University, 1001 Decarie Blvd, Montreal, H4A 3J1, Quebec, Canada.
- Quantitative Life Sciences, McGill University, 845 Rue Sherbrooke Ouest, Montreal, H3A 0G4, Quebec, Canada.
- School of Computer Science, McGill University, 3480 Rue University, Montreal, H3A 2A7, Quebec, Canada.
- Mila-Quebec AI Institute, 6666 Rue Saint-Urbain, Montreal, H2S 3H1, Quebec, Canada.
| |
Collapse
|
47
|
Chen AA, Clark K, Dewey BE, DuVal A, Pellegrini N, Nair G, Jalkh Y, Khalil S, Zurawski J, Calabresi PA, Reich DS, Bakshi R, Shou H, Shinohara RT, Alzheimer’s Disease Neuroimaging Initiative, and North American Imaging in Multiple Sclerosis Cooperative. PARE: A framework for removal of confounding effects from any distance-based dimension reduction method. PLoS Comput Biol 2024; 20:e1012241. [PMID: 38985831 PMCID: PMC11262650 DOI: 10.1371/journal.pcbi.1012241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 07/22/2024] [Accepted: 06/10/2024] [Indexed: 07/12/2024] Open
Abstract
Dimension reduction tools preserving similarity and graph structure such as t-SNE and UMAP can capture complex biological patterns in high-dimensional data. However, these tools typically are not designed to separate effects of interest from unwanted effects due to confounders. We introduce the partial embedding (PARE) framework, which enables removal of confounders from any distance-based dimension reduction method. We then develop partial t-SNE and partial UMAP and apply these methods to genomic and neuroimaging data. For lower-dimensional visualization, our results show that the PARE framework can remove batch effects in single-cell sequencing data as well as separate clinical and technical variability in neuroimaging measures. We demonstrate that the PARE framework extends dimension reduction methods to highlight biological patterns of interest while effectively removing confounding effects.
Collapse
Affiliation(s)
- Andrew A. Chen
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, United States of America
| | - Kelly Clark
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Blake E. Dewey
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Anna DuVal
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Nicole Pellegrini
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Govind Nair
- Translational Neuroradiology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Youmna Jalkh
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Masschusetts, United States of America
| | - Samar Khalil
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Masschusetts, United States of America
| | - Jon Zurawski
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Masschusetts, United States of America
| | - Peter A. Calabresi
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Daniel S. Reich
- Translational Neuroradiology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Rohit Bakshi
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Masschusetts, United States of America
- Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Russell T. Shinohara
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | | |
Collapse
|
48
|
Papiez A, Pioch J, Mollenkopf HJ, Corleis B, Dorhoi A, Polanska J. Relative effect size-based profiles as an alternative to differentiation analysis in multi-species single-cell transcriptional studies. PLoS One 2024; 19:e0305874. [PMID: 38917129 PMCID: PMC11198858 DOI: 10.1371/journal.pone.0305874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 06/04/2024] [Indexed: 06/27/2024] Open
Abstract
Combining data from experiments on multispecies studies provides invaluable contributions to the understanding of basic disease mechanisms and pathophysiology of pathogens crossing species boundaries. The task of multispecies gene expression analysis, however, is often challenging given annotation inconsistencies and in cases of small sample sizes due to bias caused by batch effects. In this work we aim to demonstrate that an alternative approach to standard differential expression analysis in single cell RNA-sequencing (scRNA-seq) based on effect size profiles is suitable for the fusion of data from small samples and multiple organisms. The analysis pipeline is based on effect size metric profiles of samples in specific cell clusters. The effect size substitutes standard differentiation analyses based on p-values and profiles identified based on these effect size metrics serve as a tool to link cell type clusters between the studied organisms. The algorithms were tested on published scRNA-seq data sets derived from several species and subsequently validated on own data from human and bovine peripheral blood mononuclear cells stimulated with Mycobacterium tuberculosis. Correlation of the effect size profiles between clusters allowed for the linkage of human and bovine cell types. Moreover, effect size ratios were used to identify differentially regulated genes in control and stimulated samples. The genes identified through effect size profiling were confirmed experimentally using qPCR. We demonstrate that in situations where batch effects dominate cell type variation in single cell small sample size multispecies studies, effect size profiling is a valid alternative to traditional statistical inference techniques.
Collapse
Affiliation(s)
- Anna Papiez
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Jonathan Pioch
- Institute of Immunology, Friedrich Loeffler Institute, Greifswald, Germany
| | | | - Björn Corleis
- Institute of Immunology, Friedrich Loeffler Institute, Greifswald, Germany
| | - Anca Dorhoi
- Institute of Immunology, Friedrich Loeffler Institute, Greifswald, Germany
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
49
|
Abu Nahia K, Sulej A, Migdał M, Ochocka N, Ho R, Kamińska B, Zagorski M, Winata CL. scRNA-seq reveals the diversity of the developing cardiac cell lineage and molecular players in heart rhythm regulation. iScience 2024; 27:110083. [PMID: 38872974 PMCID: PMC11170199 DOI: 10.1016/j.isci.2024.110083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/26/2024] [Accepted: 05/20/2024] [Indexed: 06/15/2024] Open
Abstract
We utilized scRNA-seq to delineate the diversity of cell types in the zebrafish heart. Transcriptome profiling of over 50,000 cells at 48 and 72 hpf defined at least 18 discrete cell lineages of the developing heart. Utilizing well-established gene signatures, we identified a population of cells likely to be the primary pacemaker and characterized the transcriptome profile defining this critical cell type. Two previously uncharacterized genes, atp1b3b and colec10, were found to be enriched in the sinoatrial cardiomyocytes. CRISPR/Cas9-mediated knockout of these two genes significantly reduced heart rate, implicating their role in cardiac development and conduction. Additionally, we describe other cardiac cell lineages, including the endothelial and neural cells, providing their expression profiles as a resource. Our results established a detailed atlas of the developing heart, providing valuable insights into cellular and molecular mechanisms, and pinpointed potential new players in heart rhythm regulation.
Collapse
Affiliation(s)
- Karim Abu Nahia
- International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Agata Sulej
- International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Maciej Migdał
- International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Natalia Ochocka
- Laboratory of Molecular Neurobiology, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Richard Ho
- Institute of Theoretical Physics and Mark Kac Center for Complex Systems Research, Jagiellonian University, Cracow, Poland
- The Njord Centre, Department of Physics, University of Oslo, Oslo, Norway
| | - Bożena Kamińska
- Laboratory of Molecular Neurobiology, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Marcin Zagorski
- Institute of Theoretical Physics and Mark Kac Center for Complex Systems Research, Jagiellonian University, Cracow, Poland
| | | |
Collapse
|
50
|
Xiong J, Gong F, Ma L, Wan L. scVIC: deep generative modeling of heterogeneity for scRNA-seq data. BIOINFORMATICS ADVANCES 2024; 4:vbae086. [PMID: 39027640 PMCID: PMC11256938 DOI: 10.1093/bioadv/vbae086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/15/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024]
Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) has become a valuable tool for studying cellular heterogeneity. However, the analysis of scRNA-seq data is challenging because of inherent noise and technical variability. Existing methods often struggle to simultaneously explore heterogeneity across cells, handle dropout events, and account for batch effects. These drawbacks call for a robust and comprehensive method that can address these challenges and provide accurate insights into heterogeneity at the single-cell level. Results In this study, we introduce scVIC, an algorithm designed to account for variational inference, while simultaneously handling biological heterogeneity and batch effects at the single-cell level. scVIC explicitly models both biological heterogeneity and technical variability to learn cellular heterogeneity in a manner free from dropout events and the bias of batch effects. By leveraging variational inference, we provide a robust framework for inferring the parameters of scVIC. To test the performance of scVIC, we employed both simulated and biological scRNA-seq datasets, either including, or not, batch effects. scVIC was found to outperform other approaches because of its superior clustering ability and circumvention of the batch effects problem. Availability and implementation The code of scVIC and replication for this study are available at https://github.com/HiBearME/scVIC/tree/v1.0.
Collapse
Affiliation(s)
- Jiankang Xiong
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fuzhou Gong
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liang Ma
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Lin Wan
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|