1
|
Wang G, Zhao J, Lin Y, Liu T, Zhao Y, Zhao H. scMODAL: a general deep learning framework for comprehensive single-cell multi-omics data alignment with feature links. Nat Commun 2025; 16:4994. [PMID: 40442129 PMCID: PMC12122792 DOI: 10.1038/s41467-025-60333-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2025] [Accepted: 05/16/2025] [Indexed: 06/02/2025] Open
Abstract
Recent advancements in single-cell technologies have enabled comprehensive characterization of cellular states through transcriptomic, epigenomic, and proteomic profiling at single-cell resolution. These technologies have significantly deepened our understanding of cell functions and disease mechanisms from various omics perspectives. As these technologies evolve rapidly and data resources expand, there is a growing need for computational methods that can integrate information from different modalities to facilitate joint analysis of single-cell multi-omics data. However, integrating single-cell omics datasets presents unique challenges due to varied feature correlations and technology-specific limitations. To address these challenges, we introduce scMODAL, a deep learning framework tailored for single-cell multi-omics data alignment using feature links. scMODAL integrates datasets with limited known positively correlated features, leveraging neural networks and generative adversarial networks to align cell embeddings and preserve feature topology. Our experiments demonstrate scMODAL's effectiveness in removing unwanted variation, preserving biological information, and accurately identifying cell subpopulations across diverse datasets. scMODAL not only advances integration tasks but also supports downstream analyses such as feature imputation and feature relationship inference, offering a robust solution for advancing single-cell multi-omics research.
Collapse
Affiliation(s)
- Gefei Wang
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Jia Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Yingxin Lin
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Tianyu Liu
- Department of Biostatistics, Yale University, New Haven, CT, USA
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Yize Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA.
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
| |
Collapse
|
2
|
Pandey AC, Bezney J, DeAscanis D, Kirsch EB, Ahmed F, Crinklaw A, Choudhary KS, Mandala T, Deason J, Hamidi JS, Siddique A, Ranganathan S, Brown K, Armstrong J, Head S, Ordoukhanian P, Steinmetz LM, Topol EJ. A CRISPR/Cas9-based enhancement of high-throughput single-cell transcriptomics. Nat Commun 2025; 16:4664. [PMID: 40389438 PMCID: PMC12089397 DOI: 10.1038/s41467-025-59880-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 05/03/2025] [Indexed: 05/21/2025] Open
Abstract
Single-cell RNA-seq (scRNAseq) struggles to capture the cellular heterogeneity of transcripts within individual cells due to the prevalence of highly abundant and ubiquitous transcripts, which can obscure the detection of biologically distinct transcripts expressed up to several orders of magnitude lower levels. To address this challenge, here we introduce single-cell CRISPRclean (scCLEAN), a molecular method that globally recomposes scRNAseq libraries, providing a benefit that cannot be recapitulated with deeper sequencing. scCLEAN utilizes the programmability of CRISPR/Cas9 to target and remove less than 1% of the transcriptome while redistributing approximately half of reads, shifting the focus toward less abundant transcripts. We experimentally apply scCLEAN to both heterogeneous immune cells and homogenous vascular smooth muscle cells to demonstrate its ability to uncover biological signatures in different biological contexts. We further emphasize scCLEAN's versatility by applying it to a third-generation sequencing method, single-cell MAS-Seq, to increase transcript-level detection and discovery. Here we show the possible utility of scCLEAN across a wide array of human tissues and cell types, indicating which contexts this technology proves beneficial and those in which its application is not advisable.
Collapse
Affiliation(s)
- Amitabh C Pandey
- Section of Cardiology, Tulane Heart and Vascular Institute, Department of Medicine, Tulane University School of Medicine, New Orleans, LA, USA.
- Southeast Louisiana Veterans Health Care System, New Orleans, LA, USA.
- Department of Molecular Medicine, Scripps Research Translational Institute, The Scripps Research Institute, La Jolla, CA, USA.
| | - Jon Bezney
- Genomics Core Facility, The Scripps Research Institute, La Jolla, CA, USA
- Jumpcode Genomics, San Diego, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ethan B Kirsch
- Department of Molecular Medicine, Scripps Research Translational Institute, The Scripps Research Institute, La Jolla, CA, USA
| | - Farin Ahmed
- Genomics Core Facility, The Scripps Research Institute, La Jolla, CA, USA
| | | | | | - Tony Mandala
- Genomics Core Facility, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Jasmin S Hamidi
- Department of Molecular Medicine, Scripps Research Translational Institute, The Scripps Research Institute, La Jolla, CA, USA
| | | | | | | | | | - Steven Head
- Genomics Core Facility, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Lars M Steinmetz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Genome Technology Center, Palo Alto, CA, USA
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eric J Topol
- Department of Molecular Medicine, Scripps Research Translational Institute, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
3
|
Qin Y, Liu Z, Gao S, Martínez-Vasallo C, Long Y, Zhu X, Liu B, Gao Y, Xu X, Nohales MA, Xie Q, Zhai J. 48-Hour and 24-Hour Time-lapse Single-nucleus Transcriptomics Reveal Cell-type specific Circadian Rhythms in Arabidopsis. Nat Commun 2025; 16:4171. [PMID: 40324996 PMCID: PMC12052988 DOI: 10.1038/s41467-025-59424-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 04/21/2025] [Indexed: 05/07/2025] Open
Abstract
Functional circadian clock is critical to the adaptation and survival of organisms. In land plants, the comprehensive profiling of circadian gene expression at the single-cell level is largely unknown partly due to the challenges in obtaining precisely-timed single cells embedded within cell walls. To bridge this gap, we employ time-lapse single-nucleus RNA sequencing (snRNA-seq) on Arabidopsis seedlings collected over a 48-hour window at 4-hour intervals, as well as over a 24-hour day at 2-hour intervals, yielding a total of over 77,142 and 130,000 nuclei. Here, we find that four cell clusters in the shoot share a coherent rhythm, while around 3000 genes display cell-type specific rhythmic expression. Our analysis indicates that genes encoding circadian regulators oscillate in multiple cell types, and the majority of them are well-documented core clock genes, suggesting the snRNA-seq circadian data could be used to identify more clock components oscillating in a cell-autonomous way. We identify ABF1 as a circadian regulator, whose overexpression and shortens the circadian period. Our data provides a comprehensive resource for plant circadian rhythmicity at the single-cell level (hosted at https://zhailab.bio.sustech.edu.cn/sc_circadian ).
Collapse
Affiliation(s)
- Yuwei Qin
- Shenzhen Key Laboratory of Plant Genetic Engineering and Molecular Design, Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Zhijian Liu
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China
| | - Shiqi Gao
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng, China
| | - Carlos Martínez-Vasallo
- Instituto de Biología Molecular y Celular de Plantas (IBMCP), Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, Valencia, Spain
| | - Yanping Long
- Shenzhen Key Laboratory of Plant Genetic Engineering and Molecular Design, Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Xinlong Zhu
- Shenzhen Key Laboratory of Plant Genetic Engineering and Molecular Design, Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Bin Liu
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ya Gao
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng, China
| | - Xiaodong Xu
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng, China
| | - Maria A Nohales
- Instituto de Biología Molecular y Celular de Plantas (IBMCP), Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, Valencia, Spain.
| | - Qiguang Xie
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng, China.
| | - Jixian Zhai
- Shenzhen Key Laboratory of Plant Genetic Engineering and Molecular Design, Institute of Plant and Food Science, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.
| |
Collapse
|
4
|
Xu Q, Halle L, Hediyeh-Zadeh S, Kuijs M, Riedweg R, Kilik U, Recaldin T, Yu Q, Rall I, Frum T, Adam L, Parikh S, Kfuri-Rubens R, Gander M, Klein D, Curion F, He Z, Fleck JS, Oost K, Kahnwald M, Barbiero S, Mitrofanova O, Maciag GJ, Jensen KB, Lutolf M, Liberali P, Spence JR, Gjorevski N, Beumer J, Treutlein B, Theis FJ, Camp JG. An integrated transcriptomic cell atlas of human endoderm-derived organoids. Nat Genet 2025; 57:1201-1212. [PMID: 40355592 DOI: 10.1038/s41588-025-02182-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/27/2025] [Indexed: 05/14/2025]
Abstract
Human pluripotent stem cells and tissue-resident fetal and adult stem cells can generate epithelial tissues of endodermal origin in vitro that recapitulate aspects of developing and adult human physiology. Here, we integrate single-cell transcriptomes from 218 samples covering organoids and other models of diverse endoderm-derived tissues to establish an initial version of a human endoderm-derived organoid cell atlas. The integration includes nearly one million cells across diverse conditions, data sources and protocols. We compare cell types and states between organoid models and harmonize cell annotations through mapping to primary tissue counterparts. Focusing on the intestine and lung, we provide examples of mapping data from new protocols and show how the atlas can be used as a diverse cohort to assess perturbations and disease models. The human endoderm-derived organoid cell atlas makes diverse datasets centrally available and will be valuable to assess fidelity, characterize perturbed and diseased states, and streamline protocol development.
Collapse
Affiliation(s)
- Quan Xu
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland.
| | - Lennard Halle
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Soroor Hediyeh-Zadeh
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Merel Kuijs
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Rya Riedweg
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
| | - Umut Kilik
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
- Biozentrum, University of Basel, Basel, Switzerland
| | - Timothy Recaldin
- Roche Innovation Center Basel, Roche Pharma Research and Early Development, Basel, Switzerland
| | - Qianhui Yu
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
| | - Isabell Rall
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
| | - Tristan Frum
- Department of Internal Medicine, Division of Gastroenterology and Hepatology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Lukas Adam
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
| | - Shrey Parikh
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Raphael Kfuri-Rubens
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- IIIrd Medical Department, Klinikum rechts der Isar, Munich, Germany
- School of Medicine, Technical University of Munich, Munich, Germany
| | - Manuel Gander
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Dominik Klein
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Fabiola Curion
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | - Zhisong He
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Jonas Simon Fleck
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
| | - Koen Oost
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Maurice Kahnwald
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Silvia Barbiero
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Olga Mitrofanova
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
| | - Grzegorz Jerzy Maciag
- Novo Nordisk Foundation Center for Stem Cell Medicine, reNEW, University of Copenhagen, Copenhagen, Denmark
| | - Kim B Jensen
- Novo Nordisk Foundation Center for Stem Cell Medicine, reNEW, University of Copenhagen, Copenhagen, Denmark
| | - Matthias Lutolf
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
- Laboratory of Stem Cell Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Prisca Liberali
- Biozentrum, University of Basel, Basel, Switzerland
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland
| | - Jason R Spence
- Department of Internal Medicine, Division of Gastroenterology and Hepatology, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, MI, USA
- Department of Biomedical Engineering, University of Michigan College of Engineering, Ann Arbor, MI, USA
| | - Nikolche Gjorevski
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
| | - Joep Beumer
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland
| | - Barbara Treutlein
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.
| | - Fabian J Theis
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- School of Life Sciences, Technical University of Munich, Munich, Germany.
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
| | - J Gray Camp
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center, Basel, Switzerland.
- Biozentrum, University of Basel, Basel, Switzerland.
| |
Collapse
|
5
|
Wang J, Ye F, Chai H, Jiang Y, Wang T, Ran X, Xia Q, Xu Z, Fu Y, Zhang G, Wu H, Guo G, Guo H, Ruan Y, Wang Y, Xing D, Xu X, Zhang Z. Advances and applications in single-cell and spatial genomics. SCIENCE CHINA. LIFE SCIENCES 2025; 68:1226-1282. [PMID: 39792333 DOI: 10.1007/s11427-024-2770-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 10/10/2024] [Indexed: 01/12/2025]
Abstract
The applications of single-cell and spatial technologies in recent times have revolutionized the present understanding of cellular states and the cellular heterogeneity inherent in complex biological systems. These advancements offer unprecedented resolution in the examination of the functional genomics of individual cells and their spatial context within tissues. In this review, we have comprehensively discussed the historical development and recent progress in the field of single-cell and spatial genomics. We have reviewed the breakthroughs in single-cell multi-omics technologies, spatial genomics methods, and the computational strategies employed toward the analyses of single-cell atlas data. Furthermore, we have highlighted the advances made in constructing cellular atlases and their clinical applications, particularly in the context of disease. Finally, we have discussed the emerging trends, challenges, and opportunities in this rapidly evolving field.
Collapse
Affiliation(s)
- Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Haoxi Chai
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, 310058, China
| | - Yujia Jiang
- BGI Research, Shenzhen, 518083, China
- BGI Research, Hangzhou, 310030, China
| | - Teng Wang
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xia Ran
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China
| | - Qimin Xia
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China
| | - Ziye Xu
- Department of Laboratory Medicine of The First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Yuting Fu
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Guodong Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Hanyu Wu
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Zhejiang Provincial Key Lab for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, 310058, China.
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China.
| | - Hongshan Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China.
| | - Yijun Ruan
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, 310058, China.
| | - Yongcheng Wang
- Department of Laboratory Medicine of The First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
| | - Dong Xing
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, 100871, China.
| | - Xun Xu
- BGI Research, Shenzhen, 518083, China.
- BGI Research, Hangzhou, 310030, China.
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen, 518083, China.
| | - Zemin Zhang
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
6
|
Kalfon J, Samaran J, Peyré G, Cantini L. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nat Commun 2025; 16:3607. [PMID: 40240364 PMCID: PMC12003772 DOI: 10.1038/s41467-025-58699-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 03/24/2025] [Indexed: 04/18/2025] Open
Abstract
A cell is governed by the interaction of myriads of macromolecules. Inferring such a network of interactions has remained an elusive milestone in cellular biology. Building on recent advances in large foundation models and their ability to learn without supervision, we present scPRINT, a large cell model for the inference of gene networks pre-trained on more than 50 million cells from the cellxgene database. Using innovative pretraining tasks and model architecture, scPRINT pushes large transformer models towards more interpretability and usability when uncovering the complex biology of the cell. Based on our atlas-level benchmarks, scPRINT demonstrates superior performance in gene network inference to the state of the art, as well as competitive zero-shot abilities in denoising, batch effect correction, and cell label prediction. On an atlas of benign prostatic hyperplasia, scPRINT highlights the profound connections between ion exchange, senescence, and chronic inflammation.
Collapse
Affiliation(s)
- Jérémie Kalfon
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France
| | - Jules Samaran
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France
| | - Gabriel Peyré
- CNRS and DMA de l'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure, Université PSL, 75005, Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics group, F-75015, Paris, France.
| |
Collapse
|
7
|
Zhu Q, Jiang Z, Thomson M, Gartner Z. Revealing a coherent cell state landscape across single cell datasets with CONCORD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.13.643146. [PMID: 40161827 PMCID: PMC11952503 DOI: 10.1101/2025.03.13.643146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Batch integration, denoising, and dimensionality reduction remain fundamental challenges in single-cell data analysis. While many machine learning tools aim to overcome these challenges by engineering model architectures, we use a different strategy, building on the insight that optimized mini-batch sampling during training can profoundly influence learning outcomes. We present CONCORD, a self-supervised learning approach that implements a unified, probabilistic data sampling scheme combining neighborhood-aware and dataset-aware sampling: the former enhancing resolution while the latter removing batch effects. Using only a minimalist one-hidden-layer neural network and contrastive learning, CONCORD achieves state-of-the-art performance without relying on deep architectures, auxiliary losses, or supervision. It generates high-resolution cell atlases that seamlessly integrate data across batches, technologies, and species, without relying on prior assumptions about data structure. The resulting latent representations are denoised, interpretable, and biologically meaningful-capturing gene co-expression programs, resolving subtle cellular states, and preserving both local geometric relationships and global topological organization. We demonstrate CONCORD's broad applicability across diverse datasets, establishing it as a general-purpose framework for learning unified, high-fidelity representations of cellular identity and dynamics.
Collapse
Affiliation(s)
- Qin Zhu
- Department of Pharmaceutical Chemistry, University of California San Francisco; San Francisco, CA 94158, USA
| | - Zuzhi Jiang
- Tetrad Graduate Program, University of California San Francisco; San Francisco, CA 94158, USA
| | - Matt Thomson
- Division of Biology and Biological Engineering, California Institute of Technology; Pasadena, CA 91125, USA
| | - Zev Gartner
- Department of Pharmaceutical Chemistry, University of California San Francisco; San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub; San Francisco, CA 94158, USA
- Center for Cellular Construction, University of California San Francisco; San Francisco, CA 94158, USA
| |
Collapse
|
8
|
Zappia L, Richter S, Ramírez-Suástegui C, Kfuri-Rubens R, Vornholz L, Wang W, Dietrich O, Frishberg A, Luecken MD, Theis FJ. Feature selection methods affect the performance of scRNA-seq data integration and querying. Nat Methods 2025; 22:834-844. [PMID: 40082610 PMCID: PMC11978513 DOI: 10.1038/s41592-025-02624-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 02/08/2025] [Indexed: 03/16/2025]
Abstract
The availability of single-cell transcriptomics has allowed the construction of reference cell atlases, but their usefulness depends on the quality of dataset integration and the ability to map new samples. Previous benchmarks have compared integration methods and suggest that feature selection improves performance but have not explored how best to select features. Here, we benchmark feature selection methods for single-cell RNA sequencing integration using metrics beyond batch correction and preservation of biological variation to assess query mapping, label transfer and the detection of unseen populations. We reinforce common practice by showing that highly variable feature selection is effective for producing high-quality integrations and provide further guidance on the effect of the number of features selected, batch-aware feature selection, lineage-specific feature selection and integration and the interaction between feature selection and integration models. These results are informative for analysts working on large-scale tissue atlases, using atlases or integrating their own data to tackle specific biological questions.
Collapse
Affiliation(s)
- Luke Zappia
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
| | - Sabrina Richter
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| | - Ciro Ramírez-Suástegui
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Raphael Kfuri-Rubens
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
- School of Medicine, Technical University of Munich, Munich, Germany
- Klinikum rechts der Isar, IIIrd Medical Department, Munich, Germany
| | - Larsen Vornholz
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| | - Weixu Wang
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| | - Oliver Dietrich
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany
| | - Amit Frishberg
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
| | - Malte D Luecken
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany
- Institute of Lung Health & Immunity, Helmholtz Munich; Member of the German Center for Lung Research (DZL), Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Computational Health Center, Helmholtz Munich, Neuherberg, Germany.
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Friesing, Germany.
| |
Collapse
|
9
|
Hu X, Li H, Chen M, Qian J, Jiang H. Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness. Commun Biol 2025; 8:521. [PMID: 40158033 PMCID: PMC11954866 DOI: 10.1038/s42003-025-07947-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Accepted: 03/18/2025] [Indexed: 04/01/2025] Open
Abstract
Batch effect correction (BEC) is fundamental to integrate multiple single-cell RNA sequencing datasets, and its success is critical to empower in-depth interrogation for biological insights. However, no simple metric is available to evaluate BEC performance with sensitivity to data overcorrection, which erases true biological variations and leads to false biological discoveries. Here, we propose RBET, a reference-informed statistical framework for evaluating the success of BEC. Using extensive simulations and six real data examples including scRNA-seq and scATAC-seq datasets with different numbers of batches, batch effect sizes and numbers of cell types, we demonstrate that RBET evaluates the performance of BEC methods more fairly with biologically meaningful insights from data, while other methods may lead to false results. Moreover, RBET is computationally efficient, sensitive to overcorrection and robust to large batch effect sizes. Thus, RBET provides a robust guideline on selecting case-specific BEC method, and the concept of RBET is extendable to other modalities.
Collapse
Affiliation(s)
- Xiaoyue Hu
- Center for Data Science, Zhejiang University, Hangzhou, China
- School of Mathematical Sciences, Zhejiang University, Hangzhou, China
| | - He Li
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Ming Chen
- College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Junbin Qian
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
- Institute of Genetics, Zhejiang University School of Medicine, Hangzhou, China.
- Cancer Center, Zhejiang University, Hangzhou, China.
- Zhejiang Provincial Clinical Research Center for Child Health, Hangzhou, China.
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China.
| |
Collapse
|
10
|
Houdjedj A, Marouf Y, Myradov M, Doğan SO, Erten BO, Tastan O, Erten C, Kazan H. SCITUNA: single-cell data integration tool using network alignment. BMC Bioinformatics 2025; 26:92. [PMID: 40148808 PMCID: PMC11951583 DOI: 10.1186/s12859-025-06087-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 02/17/2025] [Indexed: 03/29/2025] Open
Abstract
BACKGROUND As single-cell genomics experiments increase in complexity and scale, the need to integrate multiple datasets has grown. Such integration enhances cellular feature identification by leveraging larger data volumes. However, batch effects-technical variations arising from differences in labs, times, or protocols-pose a significant challenge. Despite numerous proposed batch correction methods, many still have limitations, such as outputting only dimension-reduced data, relying on computationally intensive models, or resulting in overcorrection for batches with diverse cell type composition. RESULTS We introduce a novel method for batch effect correction named SCITUNA, a Single-Cell data Integration Tool Using Network Alignment. We perform evaluations on 39 individual batches from four real datasets and a simulated dataset, which include both scRNA-seq and scATAC-seq datasets, spanning multiple organisms and tissues. A thorough comparison of existing batch correction methods using 13 metrics reveals that SCITUNA outperforms current approaches and is successful at preserving biological signals present in the original data. In particular, SCITUNA shows a better performance than the current methods in all the comparisons except for the multiple batch integration of the lung dataset where the difference is 0.004. CONCLUSION SCITUNA effectively removes batch effects while retaining the biological signals present in the data. Our extensive experiments reveal that SCITUNA will be a valuable tool for diverse integration tasks.
Collapse
Affiliation(s)
- Aissa Houdjedj
- Antalya Bilim University, 07190, Antalya, Turkey
- Akdeniz University, 07058, Antalya, Turkey
| | | | | | | | | | | | | | - Hilal Kazan
- Antalya Bilim University, 07190, Antalya, Turkey.
| |
Collapse
|
11
|
Zhao PA, Li R, Adewunmi T, Garber J, Gustafson C, Kim J, Malone J, Savage A, Skene P, Li XJ. SPARROW reveals microenvironment-zone-specific cell states in healthy and diseased tissues. Cell Syst 2025; 16:101235. [PMID: 40112778 DOI: 10.1016/j.cels.2025.101235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 10/23/2024] [Accepted: 02/19/2025] [Indexed: 03/22/2025]
Abstract
Spatially resolved transcriptomics technologies have advanced our understanding of cellular characteristics within tissue contexts. However, current analytical tools often treat cell-type inference and cellular neighborhood identification as separate and hard clustering processes, limiting comparability across scales and samples. SPARROW addresses these challenges by jointly learning latent embeddings and soft clusterings of cell types and cellular organization. It outperformed state-of-the-art methods in cell-type inference and microenvironment zone delineation and uncovered zone-specific cell states in human and mouse tissues that competing methods missed. By integrating spatially resolved transcriptomics and single-cell RNA sequencing (scRNA-seq) data in a shared latent space, SPARROW achieves single-cell spatial resolution and whole-transcriptome coverage, enabling the discovery of both established and unknown microenvironment zone-specific ligand-receptor interactions in the human tonsil. Overall, SPARROW is a computational framework that provides a comprehensive characterization of tissue features across scales, samples, and conditions.
Collapse
Affiliation(s)
- Peiyao A Zhao
- Allen Institute for Immunology, Seattle, WA 98109, USA.
| | - Ruoxin Li
- Allen Institute for Immunology, Seattle, WA 98109, USA
| | - Temi Adewunmi
- Allen Institute for Immunology, Seattle, WA 98109, USA
| | | | | | - June Kim
- Allen Institute for Immunology, Seattle, WA 98109, USA
| | | | - Adam Savage
- Allen Institute for Immunology, Seattle, WA 98109, USA
| | - Peter Skene
- Allen Institute for Immunology, Seattle, WA 98109, USA
| | - Xiao-Jun Li
- Allen Institute for Immunology, Seattle, WA 98109, USA.
| |
Collapse
|
12
|
Liu Z, Zhang X, Ben T, Li M, Jin Y, Wang T, Song Y. Focal adhesion in the tumour metastasis: from molecular mechanisms to therapeutic targets. Biomark Res 2025; 13:38. [PMID: 40045379 PMCID: PMC11884212 DOI: 10.1186/s40364-025-00745-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Accepted: 02/11/2025] [Indexed: 03/09/2025] Open
Abstract
The tumour microenvironment is the "hotbed" of tumour cells, providing abundant extracellular support for growth and metastasis. However, the tumour microenvironment is not static and is constantly remodelled by a variety of cellular components, including tumour cells, through mechanical, biological and chemical means to promote metastasis. Focal adhesion plays an important role in cell-extracellular matrix adhesion. An in-depth exploration of the role of focal adhesion in tumour metastasis, especially their contribution at the biomechanical level, is an important direction of current research. In this review, we first summarize the assembly of focal adhesions and explore their kinetics in tumour cells. Then, we describe in detail the role of focal adhesion in various stages of tumour metastasis, especially its key functions in cell migration, invasion, and matrix remodelling. Finally, we describe the anti-tumour strategies targeting focal adhesion and the current progress in the development of some inhibitors against focal adhesion proteins. In this paper, we summarize for the first time that focal adhesion play a positive feedback role in pro-tumour metastatic matrix remodelling by summarizing the five processes of focal adhesion assembly in a multidimensional way. It is beneficial for researchers to have a deeper understanding of the role of focal adhesion in the biological behaviour of tumour metastasis and the potential of focal adhesion as a therapeutic target, providing new ideas for the prevention and treatment of metastases.
Collapse
Affiliation(s)
- Zonghao Liu
- Department of Radiotherapy, Cancer Hospital of China Medical University, No.44 Xiaoheyan Road, Dadong District, Shenyang, Liaoning Province, 110042, P. R. China
- The First Clinical College, China Medical University, Shenyang, Liaoning Province, 110122, P. R. China
| | - Xiaofang Zhang
- Department of Medical Oncology, The First Hospital of China Medical University, Shenyang, Liaoning, 110001, China
| | - Tianru Ben
- The First Clinical College, China Medical University, Shenyang, Liaoning Province, 110122, P. R. China
| | - Mo Li
- Department of Breast Surgery, Liaoning Cancer Hospital and Institute, No.44 Xiaoheyan Road, Dadong District, Shenyang, Liaoning Province, 110042, P. R. China
| | - Yi Jin
- Department of Breast Surgery, Liaoning Cancer Hospital and Institute, No.44 Xiaoheyan Road, Dadong District, Shenyang, Liaoning Province, 110042, P. R. China
| | - Tianlu Wang
- Department of Radiotherapy, Cancer Hospital of China Medical University, No.44 Xiaoheyan Road, Dadong District, Shenyang, Liaoning Province, 110042, P. R. China.
- Department of Radiotherapy, Cancer Hospital of Dalian University of Technology, Shenyang, Liaoning Province, 110042, People's Republic of China.
- Faculty of Medicine, Dalian University of Technology, Dalian, Liaoning Province, 116024, P. R. China.
| | - Yingqiu Song
- Department of Radiotherapy, Cancer Hospital of China Medical University, No.44 Xiaoheyan Road, Dadong District, Shenyang, Liaoning Province, 110042, P. R. China.
- Department of Radiotherapy, Liaoning Cancer Hospital & Institute, No.44 Xiaoheyan Road, Dadong District, Shenyang, Liaoning Province, 110042, P. R. China.
| |
Collapse
|
13
|
Liu C, Wang K, Mei J, Zhao R, Shen J, Zhang W, Li L, Roy B, Fang X. Integrative single-cell and spatial transcriptome analysis reveals heterogeneity of human liver progenitor cells. Hepatol Commun 2025; 9:e0662. [PMID: 40008906 PMCID: PMC11868439 DOI: 10.1097/hc9.0000000000000662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Accepted: 01/02/2025] [Indexed: 02/27/2025] Open
Abstract
BACKGROUND Liver progenitor cells (LPCs) with bipotential differentiation capacities are essential for restoring liver homeostasis and hepatocyte population after damage. However, the low proportion and shared markers with epithelial cells make studying LPC heterogeneity difficult, especially in humans. To address this gap, we explored over 259,400 human liver single cells across 4 conditions (fetal, healthy, cirrhotic, and HCC-affected livers). METHODS Human liver tissue samples were analyzed using spatial transcriptomics sequencing technologies to describe the heterogeneity of LPCs. Liver tissue was characterized by LPC heterogeneity at single-cell resolution by employing cellular modules, differentiation trajectories, and gene co-expression patterns. RESULTS We annotated and identified 1 LPC cluster, 3 LPC subpopulations, and 4 distinct cellular modules, indicating the heterogeneity within LPC and the diversity between LPCs and epithelial cells. LPCs showed spatial colocalization with cholangiocytes and comprised a small proportion (2.95±1.91%) within the merged epithelial cells and LPC populations, exhibiting marked differences in marker expression patterns compared to those in mice. LPCs exhibited distinct cellular states in functional restoration, activation, proliferation, and cell transition. Additionally, the gene co-expression network of LPCs exhibited 3 unique modules, reflecting the distinct connectivity of genes encoding apolipoproteins and heat shock proteins in the gene co-expression network modules. CONCLUSIONS Our study provides valuable insights into the multifaceted heterogeneity of human LPCs. Future studies focusing on spatial gene expression dynamics will contribute to our understanding of the spatial arrangement of liver regeneration.
Collapse
Affiliation(s)
- Chuanjun Liu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI Research, Shenzhen, China
| | - Kai Wang
- Department of Hepatobiliary and Pancreatic Surgery and Minimally Invasive Surgery, Zhejiang Provincial People’s Hospital (Affiliated People’s Hospital), Hangzhou Medical College, Hangzhou, Zhejiang, China
| | | | - Ruizhen Zhao
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI Research, Shenzhen, China
| | | | - Wei Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | | | - Bhaskar Roy
- Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang, China
| | | |
Collapse
|
14
|
Lundgren S, Huuhtanen J, Keränen M, Feng X, Patel BA, Ryland GL, Fox LC, Bravo-Perez C, Clemente M, Kerr C, Walldin G, Dufva O, Zaimoku Y, Tuononen T, Myllymäki M, Ebeling F, Jokinen E, Heinonen M, Kasanen T, Klievink J, Lähteenmäki H, Jaatinen T, Kytölä S, Siitonen S, Dulau-Florea A, Braylan R, Heinäniemi M, Nakao S, Hellström-Lindberg E, Maciejewski JP, Blombery P, Young NS, Lähdesmäki H, Mustjoki S. Single-cell analysis of aplastic anemia reveals a convergence of NK and NK-like CD8 + T cells with a disease-associated TCR signature. Sci Transl Med 2025; 17:eadl6758. [PMID: 40009697 DOI: 10.1126/scitranslmed.adl6758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 08/16/2024] [Accepted: 11/15/2024] [Indexed: 02/28/2025]
Abstract
Immune aplastic anemia (AA) is a life-threatening bone marrow failure disorder driven by an autoimmune T cell attack against hematopoietic stem and progenitor cells (HSPCs). However, the exact autoantigen targets and role of other immune cells in the pathogenesis of AA are unknown. Here, we analyzed a cohort of 218 patients with AA using single-cell RNA and T cell receptor (TCR) αβ sequencing, TCRβ sequencing, flow cytometry, and plasma cytokine profiling. We identified natural killer (NK) cells and CD8+ terminally differentiated effector T (TEMRA) cells expressing NK receptors with AA-associated TCRβ motifs as the most dysregulated immune cell populations in AA bone marrow. Functional coculture experiments using primary HSPCs and immune cells showed that NK cells cannot kill HSPCs alone but may sensitize HSPCs to CD8+ T cell-mediated killing through production of interferons. Furthermore, HSPCs induced activation of T cell clones with CD8+ TEMRA NK-like phenotype in coculture. Our results reveal a convergent phenotype of innate and adaptive immune cells that may drive AA.
Collapse
Affiliation(s)
- Sofie Lundgren
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Jani Huuhtanen
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Department of Computer Science, Aalto University School of Science, Espoo 02150, Finland
| | - Mikko Keränen
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Department of Hematology, Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Xingmin Feng
- National Heart Lung and Blood Institute (NHLBI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Bhavisha A Patel
- National Heart Lung and Blood Institute (NHLBI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Georgina L Ryland
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Lucy C Fox
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Carlos Bravo-Perez
- Department of Translational Hematology and Oncology Research, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH 44106, USA
- Department of Hematology and Medical Oncology, Hospital Universitario Morales Meseguer, University of Murcia, IMIB-Pascual Parrilla, CIBERER-Instituto de Salud Carlos III, Murcia 30008, Spain
| | - Michael Clemente
- Department of Translational Hematology and Oncology Research, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH 44106, USA
| | - Cassandra Kerr
- Department of Translational Hematology and Oncology Research, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH 44106, USA
| | - Gunilla Walldin
- Center for Hematology and Regenerative Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital Huddinge, Huddinge 14157, Sweden
| | - Olli Dufva
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Yoshitaka Zaimoku
- Department of Hematology, Faculty of Medicine, Institute of Medical Pharmaceutical and Health Sciences, Kanazawa University, Ishikawa 920-1192, Japan
| | - Tiina Tuononen
- School of Medicine, University of Eastern Finland, Kuopio 70211, Finland
| | - Mikko Myllymäki
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Freja Ebeling
- Department of Hematology, Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Emmi Jokinen
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Department of Computer Science, Aalto University School of Science, Espoo 02150, Finland
| | - Markus Heinonen
- Department of Computer Science, Aalto University School of Science, Espoo 02150, Finland
- Helsinki Institute for Information Technology HIIT, Espoo 02150, Finland
| | - Tiina Kasanen
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Jay Klievink
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Hanna Lähteenmäki
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Taina Jaatinen
- Histocompatibility Testing Laboratory, Finnish Red Cross Blood Service, Vantaa 01730, Finland
| | - Sari Kytölä
- Department of Hematology, Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
| | - Sanna Siitonen
- Department of Clinical Chemistry, HUS Diagnostic Centre, Helsinki University Hospital and University of Helsinki, Helsinki 00290, Finland
| | - Alina Dulau-Florea
- Hematology Laboratory, Department of Laboratory Medicine/Clinical Center, National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Raul Braylan
- Hematology Laboratory, Department of Laboratory Medicine/Clinical Center, National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Merja Heinäniemi
- School of Medicine, University of Eastern Finland, Kuopio 70211, Finland
| | - Shinji Nakao
- Department of Hematology, Faculty of Medicine, Institute of Medical Pharmaceutical and Health Sciences, Kanazawa University, Ishikawa 920-1192, Japan
| | - Eva Hellström-Lindberg
- Center for Hematology and Regenerative Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital Huddinge, Huddinge 14157, Sweden
| | - Jaroslaw P Maciejewski
- Department of Translational Hematology and Oncology Research, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH 44106, USA
| | - Piers Blombery
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Neal S Young
- National Heart Lung and Blood Institute (NHLBI), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University School of Science, Espoo 02150, Finland
| | - Satu Mustjoki
- Hematology Research Unit Helsinki, Department of Hematology, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Translational Immunology Research Program, University of Helsinki, Helsinki 00290, Finland
- ICAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital Comprehensive Cancer Center, Helsinki 00290, Finland
- Department of Clinical Chemistry and Hematology, University of Helsinki, Helsinki 00290, Finland
| |
Collapse
|
15
|
Zaslavsky ME, Craig E, Michuda JK, Sehgal N, Ram-Mohan N, Lee JY, Nguyen KD, Hoh RA, Pham TD, Röltgen K, Lam B, Parsons ES, Macwana SR, DeJager W, Drapeau EM, Roskin KM, Cunningham-Rundles C, Moody MA, Haynes BF, Goldman JD, Heath JR, Chinthrajah RS, Nadeau KC, Pinsky BA, Blish CA, Hensley SE, Jensen K, Meyer E, Balboni I, Utz PJ, Merrill JT, Guthridge JM, James JA, Yang S, Tibshirani R, Kundaje A, Boyd SD. Disease diagnostics using machine learning of B cell and T cell receptor sequences. Science 2025; 387:eadp2407. [PMID: 39977494 PMCID: PMC12061481 DOI: 10.1126/science.adp2407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 11/29/2024] [Indexed: 02/22/2025]
Abstract
Clinical diagnosis typically incorporates physical examination, patient history, various laboratory tests, and imaging studies but makes limited use of the human immune system's own record of antigen exposures encoded by receptors on B cells and T cells. We analyzed immune receptor datasets from 593 individuals to develop MAchine Learning for Immunological Diagnosis, an interpretive framework to screen for multiple illnesses simultaneously or precisely test for one condition. This approach detects specific infections, autoimmune disorders, vaccine responses, and disease severity differences. Human-interpretable features of the model recapitulate known immune responses to severe acute respiratory syndrome coronavirus 2, influenza, and human immunodeficiency virus, highlight antigen-specific receptors, and reveal distinct characteristics of systemic lupus erythematosus and type-1 diabetes autoreactivity. This analysis framework has broad potential for scientific and clinical interpretation of immune responses.
Collapse
MESH Headings
- Humans
- Autoimmune Diseases/diagnosis
- Autoimmune Diseases/immunology
- B-Lymphocytes/immunology
- COVID-19/diagnosis
- COVID-19/immunology
- Diabetes Mellitus, Type 1/diagnosis
- Diabetes Mellitus, Type 1/immunology
- HIV Infections/diagnosis
- HIV Infections/immunology
- Influenza, Human/diagnosis
- Influenza, Human/immunology
- Lupus Erythematosus, Systemic/diagnosis
- Lupus Erythematosus, Systemic/immunology
- Machine Learning
- Receptors, Antigen, B-Cell/genetics
- Receptors, Antigen, B-Cell/immunology
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- SARS-CoV-2/immunology
- Infections/diagnosis
- Infections/immunology
Collapse
Affiliation(s)
| | - Erin Craig
- Department of Biomedical Data Science, Stanford University; Stanford, CA, USA
| | - Jackson K. Michuda
- Department of Biomedical Data Science, Stanford University; Stanford, CA, USA
| | - Nidhi Sehgal
- Department of Genetics, Stanford University; Stanford, CA, USA
- Department of Pathology, Stanford University; Stanford, CA, USA
| | - Nikhil Ram-Mohan
- Department of Emergency Medicine, Stanford University; Stanford, CA, USA
| | - Ji-Yeun Lee
- Department of Pathology, Stanford University; Stanford, CA, USA
| | - Khoa D. Nguyen
- Department of Pathology, Stanford University; Stanford, CA, USA
| | - Ramona A. Hoh
- Department of Pathology, Stanford University; Stanford, CA, USA
| | - Tho D. Pham
- Department of Pathology, Stanford University; Stanford, CA, USA
- Stanford Blood Center; Stanford, CA, USA
| | - Katharina Röltgen
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute; Allschwil, Switzerland
- University of Basel; Basel, Switzerland
| | - Brandon Lam
- Department of Pathology, Stanford University; Stanford, CA, USA
| | - Ella S. Parsons
- Sean N. Parker Center for Allergy and Asthma Research, Stanford University; Stanford, CA, USA
| | - Susan R. Macwana
- Department of Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation; Oklahoma City, OK, USA
| | - Wade DeJager
- Department of Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation; Oklahoma City, OK, USA
| | - Elizabeth M. Drapeau
- Department of Microbiology, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, USA
| | - Krishna M. Roskin
- Department of Pediatrics, University of Cincinnati, College of Medicine; Cincinnati, OH, USA
- Divisions of Biomedical Informatics and Immunobiology, Cincinnati Children’s Hospital Medical Center; Cincinnati, OH, USA
| | | | - M. Anthony Moody
- Department of Pediatrics, Duke University; Durham, NC, USA
- Duke Human Vaccine Institute, Duke University; Durham, NC, USA
- Department of Immunology, Duke University; Durham, NC, USA
| | - Barton F. Haynes
- Duke Human Vaccine Institute, Duke University; Durham, NC, USA
- Department of Immunology, Duke University; Durham, NC, USA
- Department of Medicine, Duke University; Durham, NC, USA
| | - Jason D. Goldman
- Swedish Center for Research and Innovation, Swedish Medical Center; Seattle, WA, USA
- Division of Allergy and Infectious Diseases, University of Washington; Seattle, WA, USA
| | - James R. Heath
- Institute for Systems Biology; Seattle, WA, USA
- Department of Bioengineering, University of Washington; Seattle, WA, USA
| | - R. Sharon Chinthrajah
- Sean N. Parker Center for Allergy and Asthma Research, Stanford University; Stanford, CA, USA
| | - Kari C. Nadeau
- Department of Environmental Health, Harvard T.H. Chan School of Public Health; Boston, MA, USA
- Division of Allergy and Inflammation, Beth Israel Deaconess Medical Center; Boston, MA, USA
| | - Benjamin A. Pinsky
- Department of Pathology, Stanford University; Stanford, CA, USA
- Department of Medicine, Stanford University; Stanford, CA, USA
| | | | - Scott E. Hensley
- Department of Microbiology, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, USA
| | - Kent Jensen
- Department of Medicine, Stanford University; Stanford, CA, USA
| | - Everett Meyer
- Department of Medicine, Stanford University; Stanford, CA, USA
| | - Imelda Balboni
- Department of Pediatrics, Stanford University; Stanford, CA, USA
| | - Paul J Utz
- Department of Medicine, Stanford University; Stanford, CA, USA
| | - Joan T. Merrill
- Department of Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation; Oklahoma City, OK, USA
- Department of Medicine, Grossman School of Medicine, New York University; New York, NY, USA
- Lupus Foundation of America; Washington, DC, USA
| | - Joel M. Guthridge
- Department of Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation; Oklahoma City, OK, USA
| | - Judith A. James
- Department of Arthritis and Clinical Immunology, Oklahoma Medical Research Foundation; Oklahoma City, OK, USA
| | - Samuel Yang
- Department of Emergency Medicine, Stanford University; Stanford, CA, USA
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University; Stanford, CA, USA
- Department of Statistics, Stanford University; Stanford, CA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University; Stanford, CA, USA
- Department of Genetics, Stanford University; Stanford, CA, USA
| | - Scott D. Boyd
- Department of Pathology, Stanford University; Stanford, CA, USA
- Sean N. Parker Center for Allergy and Asthma Research, Stanford University; Stanford, CA, USA
| |
Collapse
|
16
|
Micoli E, Ferrero Restelli F, Barbiera G, Moors R, Nouboers E, Du JX, Bertels H, Liu M, Konstantopoulos D, Takeoka A, Lippi G, Lim L. A single-cell transcriptomic atlas of developing inhibitory neurons reveals expanding and contracting modes of diversification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.19.636192. [PMID: 40027755 PMCID: PMC11870569 DOI: 10.1101/2025.02.19.636192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
The cerebral cortex relies on vastly different types of inhibitory neurons to compute. How this diversity emerges during development remains an open question. The rarity of individual inhibitory neuron types often leads to their underrepresentation in single-cell RNA sequencing (scRNAseq) datasets, limiting insights into their developmental trajectories. To address this problem, we developed a computational pipeline to enrich and integrate rare cell types across multiple datasets. Applying this approach to somatostatin-expressing (SST+) inhibitory neurons-the most diverse inhibitory cell class in the cortex-we constructed the Dev-SST-Atlas, a comprehensive resource containing mouse transcriptomic data of over 51,000 SST+ neurons. We identify three principal groups-Martinotti cells (MCs), non-Martinotti cells (nMCs), and long-range projecting neurons (LRPs)-each following distinct diversification trajectories. MCs commit early, with distinct embryonic and neonatal clusters that map directly to adult counterparts. In contrast, nMCs diversify gradually, with each developmental cluster giving rise to multiple adult cell types. LRPs follow a unique 'contracting' mode. Initially, two clusters are present until postnatal day 5 (P5), but by P7, one type is eliminated through programmed cell death, leaving a single surviving population. This transient LRP type is also found in the fetal human cortex, revealing an evolutionarily conserved feature of cortical development. Together, these findings highlight three distinct modes of SST+ neuron diversification-invariant, expanding, and contracting-offering a new framework to understand how the large repertoire of inhibitory neurons emerges during development.
Collapse
Affiliation(s)
- Elia Micoli
- VIB Center for Brain and Disease, 3000, Leuven, Belgium
- Department of Neurosciences, Katholieke Universiteit (KU) Leuven, 3000, Leuven, Belgium
- These authors contributed equally
| | - Facundo Ferrero Restelli
- VIB Center for Brain and Disease, 3000, Leuven, Belgium
- Department of Neurosciences, Katholieke Universiteit (KU) Leuven, 3000, Leuven, Belgium
- These authors contributed equally
| | | | - Rani Moors
- VIB Center for Brain and Disease, 3000, Leuven, Belgium
- Department of Neurosciences, Katholieke Universiteit (KU) Leuven, 3000, Leuven, Belgium
| | - Evelien Nouboers
- VIB Center for Brain and Disease, 3000, Leuven, Belgium
- Department of Neurosciences, Katholieke Universiteit (KU) Leuven, 3000, Leuven, Belgium
| | - Jessica Xinyun Du
- Department of Neuroscience, Scripps Research Institute, La Jolla, United States of America
| | - Hannah Bertels
- Department of Neurosciences, Katholieke Universiteit (KU) Leuven, 3000, Leuven, Belgium
| | - Minhui Liu
- VIB Center for Brain and Disease, 3000, Leuven, Belgium
- Department of Neurosciences, Katholieke Universiteit (KU) Leuven, 3000, Leuven, Belgium
| | | | - Aya Takeoka
- RIKEN Center for Brain Science, Saitama, 351-0198, Japan
| | - Giordiano Lippi
- Department of Neuroscience, Scripps Research Institute, La Jolla, United States of America
| | - Lynette Lim
- VIB Center for Brain and Disease, 3000, Leuven, Belgium
- Department of Neurosciences, Katholieke Universiteit (KU) Leuven, 3000, Leuven, Belgium
- Lead contact
| |
Collapse
|
17
|
Zhao B, Song K, Wei DQ, Xiong Y, Ding J. scCobra allows contrastive cell embedding learning with domain adaptation for single cell data integration and harmonization. Commun Biol 2025; 8:233. [PMID: 39948393 PMCID: PMC11825689 DOI: 10.1038/s42003-025-07692-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 02/06/2025] [Indexed: 02/16/2025] Open
Abstract
The rapid advancement of single-cell technologies has created an urgent need for effective methods to integrate and harmonize single-cell data. Technical and biological variations across studies complicate data integration, while conventional tools often struggle with reliance on gene expression distribution assumptions and over-correction. Here, we present scCobra, a deep generative neural network designed to overcome these challenges through contrastive learning with domain adaptation. scCobra effectively mitigates batch effects, minimizes over-correction, and ensures biologically meaningful data integration without assuming specific gene expression distributions. It enables online label transfer across datasets with batch effects, allowing continuous integration of new data without retraining. Additionally, scCobra supports batch effect simulation, advanced multi-omic integration, and scalable processing of large datasets. By integrating and harmonizing datasets from similar studies, scCobra expands the available data for investigating specific biological problems, improving cross-study comparability, and revealing insights that may be obscured in isolated datasets.
Collapse
Affiliation(s)
- Bowen Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, QC, Canada
- Division of Experimental Medicine, Department of Medicine, McGill University, Montreal, QC, Canada
| | - Kailu Song
- Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, QC, Canada
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| | - Jun Ding
- Meakins-Christie Laboratories, Department of Medicine, McGill University Health Centre, Montreal, QC, Canada.
- Division of Experimental Medicine, Department of Medicine, McGill University, Montreal, QC, Canada.
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada.
- School of Computer Science, McGill University, Montreal, QC, Canada.
- Mila-Quebec AI Institute, Montreal, QC, Canada.
| |
Collapse
|
18
|
Ma L, Liu J, Sun W, Zhao C, Yu L. scMFG: a single-cell multi-omics integration method based on feature grouping. BMC Genomics 2025; 26:132. [PMID: 39934664 PMCID: PMC11817349 DOI: 10.1186/s12864-025-11319-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 02/03/2025] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND Recent advancements in methodologies and technologies have enabled the simultaneous measurement of multiple omics data, which provides a comprehensive understanding of cellular heterogeneity. However, existing methods have limitations in accurately identifying cell types while maintaining model interpretability, especially in the presence of noise. METHODS We propose a novel method called scMFG, which leverages feature grouping and group integration techniques for the integration of single-cell multi-omics data. By organizing features with similar characteristics within each omics layer through feature grouping. Furthermore, scMFG ensures a consistent feature grouping approach across different omics layers, promoting comparability of diverse data types. Additionally, scMFG incorporates a matrix factorization-based approach to enable the integrated results remain interpretable. RESULTS We comprehensively evaluated scMFG's performance on four complex real-world datasets generated using diverse sequencing technologies, highlighting its robustness in accurately identifying cell types. Notably, scMFG exhibited superior performance in deciphering cellular heterogeneity at a finer resolution compared to existing methods when applied to simulated datasets. Furthermore, our method proved highly effective in identifying rare cell types, showcasing its robust performance and suitability for detecting low-abundance cellular populations. The interpretability of scMFG was successfully validated through its specific association of outputs with specific cell types or states observed in the neonatal mouse cerebral cortices dataset. Moreover, we demonstrated that scMFG is capable of identifying cell developmental trajectories even in datasets with batch effects. CONCLUSIONS Our work presents a robust framework for the analysis of single-cell multi-omics data, advancing our understanding of cellular heterogeneity in a comprehensive and interpretable manner.
Collapse
Affiliation(s)
- Litian Ma
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Jingtao Liu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Wei Sun
- Department of Rehabilitation Medicine, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China
| | - Chenguang Zhao
- Department of Rehabilitation Medicine, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China.
| |
Collapse
|
19
|
Wu J, Wan C, Ji Z, Zhou Y, Hou W. EpiFoundation: A Foundation Model for Single-Cell ATAC-seq via Peak-to-Gene Alignment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.05.636688. [PMID: 39975086 PMCID: PMC11839112 DOI: 10.1101/2025.02.05.636688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Foundation models exhibit strong capabilities for downstream tasks by learning generalized representations through self-supervised pre-training on large datasets. While several foundation models have been developed for single-cell RNA-seq (scRNA-seq) data, there is still a lack of models specifically tailored for single-cell ATAC-seq (scATAC-seq), which measures epigenetic information in individual cells. The principal challenge in developing such a model lies in the vast number of scATAC peaks and the significant sparsity of the data, which complicates the formulation of peak-to-peak correlations. To address this challenge, we introduce EpiFoundation, a foundation model for learning cell representations from the high-dimensional and sparse space of peaks. EpiFoundation relies on an innovative cross-modality pre-training procedure with two key technical innovations. First, EpiFoundation exclusively processes the non-zero peak set, thereby enhancing the density of cell-specific information within the input data. Second, EpiFoundation utilizes dense gene expression information to supervise the pre-training process, aligning peak-to-gene correlations. EpiFoundation can handle various types of downstream tasks, including cell-type annotation, batch correction, and gene expression prediction. To train and validate EpiFoundation, we curated MiniAtlas, a dataset of 100,000+ single cells with paired scRNA-seq and scATAC-seq data, along with diverse test sets spanning various tissues and cell types for robust evaluation. EpiFoundation demonstrates state-of-the-art performance across multiple tissues and diverse downstream tasks.
Collapse
Affiliation(s)
- Juncheng Wu
- Department of Computer Science and Engineering, UC Santa Cruz
| | - Changxin Wan
- Department of Biostatistics and Bioinformatics, Duke University
| | - Zhicheng Ji
- Department of Biostatistics and Bioinformatics, Duke University
| | - Yuyin Zhou
- Department of Computer Science and Engineering, UC Santa Cruz
| | - Wenpin Hou
- Department of Biostatistics, Mailman School of Public Health, Columbia University
| |
Collapse
|
20
|
Li H, Côté P, Kuoch M, Ezike J, Frenis K, Afanassiev A, Greenstreet L, Tanaka-Yano M, Tarantino G, Zhang S, Whangbo J, Butty VL, Moiso E, Falchetti M, Lu K, Connelly GG, Morris V, Wang D, Chen AF, Bianchi G, Daley GQ, Garg S, Liu D, Chou ST, Regev A, Lummertz da Rocha E, Schiebinger G, Rowe RG. The dynamics of hematopoiesis over the human lifespan. Nat Methods 2025; 22:422-434. [PMID: 39639169 PMCID: PMC11908799 DOI: 10.1038/s41592-024-02495-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 09/19/2024] [Indexed: 12/07/2024]
Abstract
Over a lifetime, hematopoietic stem cells (HSCs) adjust their lineage output to support age-aligned physiology. In model organisms, stereotypic waves of hematopoiesis have been observed corresponding to defined age-biased HSC hallmarks. However, how the properties of hematopoietic stem and progenitor cells change over the human lifespan remains unclear. To address this gap, we profiled individual transcriptome states of human hematopoietic stem and progenitor cells spanning gestation, maturation and aging. Here we define the gene expression networks dictating age-specific differentiation of HSCs and the dynamics of fate decisions and lineage priming throughout life. We additionally identifiy and functionally validate a fetal-specific HSC state with robust engraftment and multilineage capacity. Furthermore, we observe that classification of acute myeloid leukemia against defined transcriptional age states demonstrates that utilization of early life transcriptional programs associates with poor prognosis. Overall, we provide a disease-relevant framework for heterochronic orientation of stem cell ontogeny along the real time axis of the human lifespan.
Collapse
Affiliation(s)
- Hojun Li
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA.
- Harvard Medical School, Boston, MA, USA.
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Pediatrics, University of California, San Diego, CA, USA.
- Division of Hematology/Oncology, Rady Children's Hospital, San Diego, CA, USA.
| | - Parker Côté
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Pediatrics, University of California, San Diego, CA, USA
| | - Michael Kuoch
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jideofor Ezike
- Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA, USA
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Katie Frenis
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA
| | - Anton Afanassiev
- Department of Mathematics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Laura Greenstreet
- Department of Mathematics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Mayuri Tanaka-Yano
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Giuseppe Tarantino
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Stephen Zhang
- Department of Mathematics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Jennifer Whangbo
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Vor Biopharma, Cambridge, MA, USA
| | - Vincent L Butty
- Barbara K. Ostrom Bioinformatics Facility, Integrated Genomics and Bioinformatics Core of the Koch Institute, Cambridge, MA, USA
| | - Enrico Moiso
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Marcelo Falchetti
- Departments of Microbiology, Immunology and Parasitology, Federal University of Santa Catarina, Florianopolis, Brazil
| | - Kate Lu
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Guinevere G Connelly
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Vivian Morris
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA
| | - Dahai Wang
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA
| | - Antonia F Chen
- Harvard Medical School, Boston, MA, USA
- Department of Orthopedic Surgery, Brigham and Women's Hospital, Boston, MA, USA
| | - Giada Bianchi
- Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - George Q Daley
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Salil Garg
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - David Liu
- Harvard Medical School, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Stella T Chou
- Division of Hematology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - Aviv Regev
- Division of Hematology/Oncology, Rady Children's Hospital, San Diego, CA, USA
- Genentech, South San Francisco, CA, USA
| | - Edroaldo Lummertz da Rocha
- Departments of Microbiology, Immunology and Parasitology, Federal University of Santa Catarina, Florianopolis, Brazil
| | - Geoffrey Schiebinger
- Department of Mathematics, University of British Columbia, Vancouver, British Columbia, Canada
| | - R Grant Rowe
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, USA.
- Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
21
|
Liu Y, Li Z, Chen X, Cui X, Gao Z, Jiang R. INSTINCT: Multi-sample integration of spatial chromatin accessibility sequencing data via stochastic domain translation. Nat Commun 2025; 16:1247. [PMID: 39893190 PMCID: PMC11787322 DOI: 10.1038/s41467-025-56535-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 01/13/2025] [Indexed: 02/04/2025] Open
Abstract
Recent advances in spatial epigenomic techniques have given rise to spatial assay for transposase-accessible chromatin using sequencing (spATAC-seq) data, enabling the characterization of epigenomic heterogeneity and spatial information simultaneously. Integrative analysis of multiple spATAC-seq samples, for which no method has been developed, allows for effective identification and elimination of unwanted non-biological factors within the data, enabling comprehensive exploration of tissue structures and providing a holistic epigenomic landscape, thereby facilitating the discovery of biological implications and the study of regulatory processes. In this article, we present INSTINCT, a method for multi-sample INtegration of Spatial chromaTIN accessibility sequencing data via stochastiC domain Translation. INSTINCT can efficiently handle the high dimensionality of spATAC-seq data and eliminate the complex noise and batch effects of samples through a stochastic domain translation procedure. We demonstrate the superiority and robustness of INSTINCT in integrating spATAC-seq data across multiple simulated scenarios and real datasets. Additionally, we highlight the advantages of INSTINCT in spatial domain identification, visualization, spot-type annotation, and various downstream analyses, including motif enrichment analysis, expression enrichment analysis, and partitioned heritability analysis.
Collapse
Affiliation(s)
- Yuyao Liu
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
22
|
Gong Y, Dai Y, Wu Q, Guo L, Yao X, Yang Q. Benchmark of Data Integration in Single-Cell Proteomics. Anal Chem 2025; 97:1254-1263. [PMID: 39761355 DOI: 10.1021/acs.analchem.4c04933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Single-cell proteomics (SCP) detected based on different technologies always involves batch-specific variations because of differences in sample processing and other potential biases. How to integrate SCP data effectively has become a great challenge. Integration of SCP data not only requires the conservation of true biological variances, but also realizes the removal of unwanted batch effects. In this study, benchmarking analysis of popular data integration methods was conducted to determine the most suitable method for SCP data. To comprehensively evaluate the performance of these integration methods, a novel evaluation system was proposed for integrating SCP data. This evaluation system consists of three objective measures from different perspectives: category (a), the efficacy of correcting batch effects; category (b), the power of conserving biological variances; and category (c), the ability to identify consistent markers. For this comprehensive evaluation, five benchmark data sets under different scenarios (containing substantial proteins, substantial cells, multiple batches, multiple cell types, and unbalanced data) were utilized for selecting the most suitable data integration method. As a result, three methods, ComBat, Scanorama, and Seurat version 3 CCA, were identified as the most recommended methods for integrating SCP data. Overall, this systematic evaluation might provide valuable guidance in choosing the appropriate method for data integration in the SCP.
Collapse
Affiliation(s)
- Yaguo Gong
- School of Pharmacy, State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Macao 999078, China
| | - Yangbo Dai
- State Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Qibiao Wu
- School of Pharmacy, State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Macao 999078, China
| | - Li Guo
- State Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| |
Collapse
|
23
|
Chicco D, Fabris A, Jurman G. The Venus score for the assessment of the quality and trustworthiness of biomedical datasets. BioData Min 2025; 18:1. [PMID: 39780220 PMCID: PMC11716409 DOI: 10.1186/s13040-024-00412-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 12/02/2024] [Indexed: 01/11/2025] Open
Abstract
Biomedical datasets are the mainstays of computational biology and health informatics projects, and can be found on multiple data platforms online or obtained from wet-lab biologists and physicians. The quality and the trustworthiness of these datasets, however, can sometimes be poor, producing bad results in turn, which can harm patients and data subjects. To address this problem, policy-makers, researchers, and consortia have proposed diverse regulations, guidelines, and scores to assess the quality and increase the reliability of datasets. Although generally useful, however, they are often incomplete and impractical. The guidelines of Datasheets for Datasets, in particular, are too numerous; the requirements of the Kaggle Dataset Usability Score focus on non-scientific requisites (for example, including a cover image); and the European Union Artificial Intelligence Act (EU AI Act) sets forth sparse and general data governance requirements, which we tailored to datasets for biomedical AI. Against this backdrop, we introduce our new Venus score to assess the data quality and trustworthiness of biomedical datasets. Our score ranges from 0 to 10 and consists of ten questions that anyone developing a bioinformatics, medical informatics, or cheminformatics dataset should answer before the release. In this study, we first describe the EU AI Act, Datasheets for Datasets, and the Kaggle Dataset Usability Score, presenting their requirements and their drawbacks. To do so, we reverse-engineer the weights of the influential Kaggle Score for the first time and report them in this study. We distill the most important data governance requirements into ten questions tailored to the biomedical domain, comprising the Venus score. We apply the Venus score to twelve datasets from multiple subdomains, including electronic health records, medical imaging, microarray and bulk RNA-seq gene expression, cheminformatics, physiologic electrogram signals, and medical text. Analyzing the results, we surface fine-grained strengths and weaknesses of popular datasets, as well as aggregate trends. Most notably, we find a widespread tendency to gloss over sources of data inaccuracy and noise, which may hinder the reliable exploitation of data and, consequently, research results. Overall, our results confirm the applicability and utility of the Venus score to assess the trustworthiness of biomedical data.
Collapse
Affiliation(s)
- Davide Chicco
- Università di Milano-Bicocca & University of Toronto, Toronto, Canada.
| | | | | |
Collapse
|
24
|
Zhong H, Han W, Gomez-Cabrero D, Tegner J, Gao X, Cui G, Aranda M. Benchmarking cross-species single-cell RNA-seq data integration methods: towards a cell type tree of life. Nucleic Acids Res 2025; 53:gkae1316. [PMID: 39778870 PMCID: PMC11707536 DOI: 10.1093/nar/gkae1316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 11/23/2024] [Accepted: 12/27/2024] [Indexed: 01/11/2025] Open
Abstract
Cross-species single-cell RNA-seq data hold immense potential for unraveling cell type evolution and transferring knowledge between well-explored and less-studied species. However, challenges arise from interspecific genetic variation, batch effects stemming from experimental discrepancies and inherent individual biological differences. Here, we benchmarked nine data-integration methods across 20 species, encompassing 4.7 million cells, spanning eight phyla and the entire animal taxonomic hierarchy. Our evaluation reveals notable differences between the methods in removing batch effects and preserving biological variance across taxonomic distances. Methods that effectively leverage gene sequence information capture underlying biological variances, while generative model-based approaches excel in batch effect removal. SATURN demonstrates robust performance across diverse taxonomic levels, from cross-genus to cross-phylum, emphasizing its versatility. SAMap excels in integrating species beyond the cross-family level, especially for atlas-level cross-species integration, while scGen shines within or below the cross-class hierarchy. As a result, our analysis offers recommendations and guidelines for selecting suitable integration methods, enhancing cross-species single-cell RNA-seq analyses and advancing algorithm development.
Collapse
Affiliation(s)
- Huawen Zhong
- BioEngineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - David Gomez-Cabrero
- BioEngineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Unit of Translational Bioinformatics, Navarrabiomed—Fundación Miguel Servet, Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
| | - Jesper Tegner
- BioEngineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, L8:05, SE-171 76 Stockholm, Sweden
- Science for Life Laboratory, Tomtebodavagen 23A, SE-17165 Solna, Sweden
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Center of Excellence on Smart Health, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Guoxin Cui
- BioEngineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Marine Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Manuel Aranda
- BioEngineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Marine Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
25
|
Golchin A, Shams F, Moradi F, Sadrabadi AE, Parviz S, Alipour S, Ranjbarvan P, Hemmati Y, Rahnama M, Rasmi Y, Aziz SGG. Single-cell Technology in Stem Cell Research. Curr Stem Cell Res Ther 2025; 20:9-32. [PMID: 38243989 DOI: 10.2174/011574888x265479231127065541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/23/2023] [Accepted: 10/04/2023] [Indexed: 01/22/2024]
Abstract
Single-cell technology (SCT), which enables the examination of the fundamental units comprising biological organs, tissues, and cells, has emerged as a powerful tool, particularly in the field of biology, with a profound impact on stem cell research. This innovative technology opens new pathways for acquiring cell-specific data and gaining insights into the molecular pathways governing organ function and biology. SCT is not only frequently used to explore rare and diverse cell types, including stem cells, but it also unveils the intricacies of cellular diversity and dynamics. This perspective, crucial for advancing stem cell research, facilitates non-invasive analyses of molecular dynamics and cellular functions over time. Despite numerous investigations into potential stem cell therapies for genetic disorders, degenerative conditions, and severe injuries, the number of approved stem cell-based treatments remains limited. This limitation is attributed to the various heterogeneities present among stem cell sources, hindering their widespread clinical utilization. Furthermore, stem cell research is intimately connected with cutting-edge technologies, such as microfluidic organoids, CRISPR technology, and cell/tissue engineering. Each strategy developed to overcome the constraints of stem cell research has the potential to significantly impact advanced stem cell therapies. Drawing on the advantages and progress achieved through SCT-based approaches, this study aims to provide an overview of the advancements and concepts associated with the utilization of SCT in stem cell research and its related fields.
Collapse
Affiliation(s)
- Ali Golchin
- Cellular and Molecular Research Center, Cellular and Molecular Medicine Institute, Urmia University of Medical Sciences, Urmia, Iran
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Forough Shams
- Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Shahid, Beheshti University of Medical Sciences, Tehran, Iran
| | - Faezeh Moradi
- Department of Tissue Engineering, School of Medicine, Tarbiat Modares University, Tehran, Iran
| | - Amin Ebrahimi Sadrabadi
- Department of Stem Cells and Developmental Biology, Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR , Tehran, Iran
| | - Shima Parviz
- Department of Tissue Engineering and Applied Cell Sciences, School of Advanced Medical Sciences and Technologies, Shiraz, University of Medical Sciences, Shiraz, Iran
| | - Shahriar Alipour
- Cellular and Molecular Research Center, Cellular and Molecular Medicine Institute, Urmia University of Medical Sciences, Urmia, Iran
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Parviz Ranjbarvan
- Cellular and Molecular Research Center, Cellular and Molecular Medicine Institute, Urmia University of Medical Sciences, Urmia, Iran
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Yaser Hemmati
- Department of Prosthodontics, Dental Faculty, Urmia University of Medical Science, Urmia, Iran
| | - Maryam Rahnama
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Yousef Rasmi
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Shiva Gholizadeh-Ghaleh Aziz
- Department of Clinical Biochemistry and Applied Cell Sciences, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| |
Collapse
|
26
|
Gao C, Welch JD. Integrating single-cell multimodal epigenomic data using 1D convolutional neural networks. Bioinformatics 2024; 41:btae705. [PMID: 39820306 PMCID: PMC11751632 DOI: 10.1093/bioinformatics/btae705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 09/30/2024] [Accepted: 01/14/2025] [Indexed: 01/19/2025] Open
Abstract
MOTIVATION Recent experimental developments enable single-cell multimodal epigenomic profiling, which measures multiple histone modifications and chromatin accessibility within the same cell. Such parallel measurements provide exciting new opportunities to investigate how epigenomic modalities vary together across cell types and states. A pivotal step in using these types of data is integrating the epigenomic modalities to learn a unified representation of each cell, but existing approaches are not designed to model the unique nature of this data type. Our key insight is to model single-cell multimodal epigenome data as a multichannel sequential signal. RESULTS We developed ConvNet-VAEs, a novel framework that uses one-dimensional (1D) convolutional variational autoencoders (VAEs) for single-cell multimodal epigenomic data integration. We evaluated ConvNet-VAEs on nano-CUT&Tag and single-cell nanobody-tethered transposition followed by sequencing data generated from juvenile mouse brain and human bone marrow. We found that ConvNet-VAEs can perform dimension reduction and batch correction better than previous architectures while using significantly fewer parameters. Furthermore, the performance gap between convolutional and fully connected architectures increases with the number of modalities, and deeper convolutional architectures can increase the performance, while the performance degrades for deeper fully connected architectures. Our results indicate that convolutional autoencoders are a promising method for integrating current and future single-cell multimodal epigenomic datasets. AVAILABILITY AND IMPLEMENTATION The source code of VAE models and a demo in Jupyter notebook are available at https://github.com/welch-lab/ConvNetVAE.
Collapse
Affiliation(s)
- Chao Gao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, United States
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI 48109, United States
| |
Collapse
|
27
|
Dong M, Agrawal K, Fan R, Sefik E, Flavell RA, Kluger Y. Scaling deep identifiable models enables zero-shot characterization of single-cell biological states. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.11.566161. [PMID: 38014345 PMCID: PMC10680588 DOI: 10.1101/2023.11.11.566161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
How to identify true biological differences across samples while overcoming batch effects has been a persistent challenge in single-cell RNA-seq data analysis, hindering analyses across datasets for transferable biological findings. In this work, we show that scaling up deep identifiable models leads to a surprisingly effective solution for this challenging task. We developed scShift, a deep variational inference framework with theoretical support in disentangling batch-dependent and independent variations. By training the model with compendiums of scRNA-seq atlases, scShift shows remarkable zero-shot capabilities in revealing representations of cell types and biological states in single-cell data while overcoming batch effects. We employed scShift to systematically compare lung fibrosis states across different datasets, tissues and experimental systems. scShift uniquely extrapolates lung fibrosis states to previously unseen post-COVID-19 fibrosis, characterizing universal myeloid-fibrosis signatures, potential repurposing drug targets and fibrosis-associated cell interactions. Evaluations of over 200 trained scShift models demonstrate emergent zero-shot capabilities and a scaling law beyond a transition threshold, with respect to dataset diversity. With its scaling performance on massive single-cell compendiums and exceptional zero-shot capabilities, scShift represents an important advance toward next-generation computational models for single-cell analysis.
Collapse
Affiliation(s)
- Mingze Dong
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Kriti Agrawal
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Immunobiology, Yale University, New Haven, CT, USA
| | - Rong Fan
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Yale Stem Cell Center and Yale Cancer Center, Yale University, New Haven, CT, USA
- Human and Translational Immunology, Yale University, New Haven, CT, USA
| | - Esen Sefik
- Department of Immunobiology, Yale University, New Haven, CT, USA
| | - Richard A. Flavell
- Department of Immunobiology, Yale University, New Haven, CT, USA
- Howard Hughes Medical Institute, Yale University, New Haven, CT, USA
| | - Yuval Kluger
- Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Applied Mathematics Program, Yale University, New Haven, CT, USA
| |
Collapse
|
28
|
Maurer K, Grabski IN, Houot R, Gohil SH, Miura S, Redd R, Lyu H, Lu W, Arihara Y, Budka J, McDonough M, Ansuinelli M, Reynolds C, Jacene H, Li S, Livak KJ, Ritz J, Miles B, Mattie M, Neuberg DS, Irizarry RA, Armand P, Wu CJ, Jacobson C. Baseline immune state and T-cell clonal kinetics are associated with durable response to CAR-T therapy in large B-cell lymphoma. Blood 2024; 144:2490-2502. [PMID: 39241199 PMCID: PMC11952007 DOI: 10.1182/blood.2024024381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 07/30/2024] [Accepted: 08/12/2024] [Indexed: 09/08/2024] Open
Abstract
ABSTRACT Engineered cellular therapy with CD19-targeting chimeric antigen receptor T cells (CAR-Ts) has revolutionized outcomes for patients with relapsed/refractory large B-cell lymphoma (LBCL), but the cellular and molecular features associated with response remain largely unresolved. We analyzed serial peripheral blood samples ranging from the day of apheresis (day -28/baseline) to 28 days after CAR-T infusion from 50 patients with LBCL treated with axicabtagene ciloleucel by integrating single-cell RNA and T-cell receptor sequencing, flow cytometry, and mass cytometry to characterize features associated with response to CAR-T. Pretreatment patient characteristics associated with response included the presence of B cells and increased absolute lymphocyte count to absolute monocyte count ratio (ALC/AMC). Infusion products from responders were enriched for clonally expanded, highly activated CD8+ T cells. We expanded these observations to 99 patients from the ZUMA-1 cohort and identified a subset of patients with elevated baseline B cells, 80% of whom were complete responders. We integrated B-cell proportion ≥0.5% and ALC/AMC ≥1.2 into a 2-factor predictive model and applied this model to the ZUMA-1 cohort. Estimated progression-free survival at 1 year in patients meeting 1 or both criteria was 65% vs 31% for patients meeting neither criterion. Our results suggest that patients' immunologic state at baseline affects the likelihood of response to CAR-T through both modulation of the T-cell apheresis product composition and promoting a more favorable circulating immune compartment before therapy. These baseline immunologic features, measured readily in the clinical setting before CAR-T, can be applied to predict response to therapy.
Collapse
MESH Headings
- Humans
- Immunotherapy, Adoptive/methods
- Lymphoma, Large B-Cell, Diffuse/therapy
- Lymphoma, Large B-Cell, Diffuse/immunology
- Male
- Female
- Middle Aged
- Aged
- Adult
- Receptors, Chimeric Antigen/immunology
- Biological Products/therapeutic use
- Antigens, CD19/immunology
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/genetics
- T-Lymphocytes/immunology
- Treatment Outcome
Collapse
Affiliation(s)
- Katie Maurer
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | | | - Roch Houot
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Department of Hematology, University Hospital of Rennes, UMR U1236, INSERM, University of Rennes, Rennes, France
| | - Satyen H. Gohil
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Department of Haematology, University College London, London, United Kingdom
- Department of Haematology, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
| | - Shogo Miura
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Robert Redd
- Department of Biostatistics, Harvard University, Boston, MA
| | - Haoxiang Lyu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
| | - Wesley Lu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
| | - Yohei Arihara
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | | | - Mikaela McDonough
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Michela Ansuinelli
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Carol Reynolds
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Heather Jacene
- Harvard Medical School, Boston, MA
- Department of Imaging, Dana-Farber Cancer Institute, Boston, MA
| | - Shuqiang Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
- Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
| | - Kenneth J. Livak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
| | - Jerome Ritz
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
| | | | | | - Donna S. Neuberg
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
| | - Rafael A. Irizarry
- Department of Biostatistics, Harvard University, Boston, MA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
| | - Philippe Armand
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
| | - Catherine J. Wu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Caron Jacobson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
| |
Collapse
|
29
|
Cui Y, Zhang W, Zeng X, Yang Y, Park SJ, Nakai K. Computational analysis of the functional impact of MHC-II-expressing triple-negative breast cancer. Front Immunol 2024; 15:1497251. [PMID: 39664386 PMCID: PMC11631845 DOI: 10.3389/fimmu.2024.1497251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 11/08/2024] [Indexed: 12/13/2024] Open
Abstract
The tumor microenvironment (TME) plays a crucial role in tumor progression and immunoregulation. Major histocompatibility complex class II (MHC-II) is essential for immune surveillance within the TME. While MHC-II genes are typically expressed by professional antigen-presenting cells, they are also expressed in tumor cells, potentially facilitating antitumor immune responses. To understand the role of MHC-II-expressing tumor cells, we analyzed triple-negative breast cancer (TNBC), an aggressive subtype with poor prognosis and limited treatment options, using public bulk RNA-seq, single-cell RNA-seq, and spatial transcriptomics datasets. Our analysis revealed a distinct tumor subpopulation that upregulates MHC-II genes and actively interacts with immune cells. We implicated that this subpopulation is preferentially present in proximity to regions in immune infiltration of TNBC patient cohorts with a better prognosis, suggesting the functional importance of MHC-II-expressing tumor cells in modulating the immune landscape and influencing patient survival outcomes. Remarkably, we identified a prognostic signature comprising 40 significant genes in the MHC-II-expressing tumors in which machine leaning models with the signature successfully predicted patient survival outcomes and the degree of immune infiltration. This study advances our understanding of the immunological basis of cancer progression and suggests promising new directions for therapeutic strategies.
Collapse
Affiliation(s)
- Yang Cui
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Weihang Zhang
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Xin Zeng
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Yitao Yang
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| |
Collapse
|
30
|
Zhang Z, Mathew D, Lim TL, Mason K, Martinez CM, Huang S, Wherry EJ, Susztak K, Minn AJ, Ma Z, Zhang NR. Recovery of biological signals lost in single-cell batch integration with CellANOVA. Nat Biotechnol 2024:10.1038/s41587-024-02463-1. [PMID: 39592777 DOI: 10.1038/s41587-024-02463-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 10/02/2024] [Indexed: 11/28/2024]
Abstract
Data integration to align cells across batches has become a cornerstone of single-cell data analysis, critically affecting downstream results. Currently, there are no guidelines for when the biological differences between samples are separable from batch effects. Here we show that current paradigms for single-cell data integration remove biologically meaningful variation and introduce distortion. We present a statistical model and computationally scalable algorithm, CellANOVA (cell state space analysis of variance), that harnesses experimental design to explicitly recover biological signals that are erased during single-cell data integration. CellANOVA uses a 'pool-of-controls' design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest and allow the recovery of subtle biological signals. We apply CellANOVA to diverse contexts and validate the recovered biological signals by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nucleus data integration, where it recovers subtle biological signals that can be validated and replicated by external data.
Collapse
Affiliation(s)
- Zhaojun Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Divij Mathew
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Tristan L Lim
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kaishu Mason
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Clara Morral Martinez
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sijia Huang
- Penn Institute of Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - E John Wherry
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Katalin Susztak
- Renal, Electrolyte and Hypertension Division, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Andy J Minn
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Zongming Ma
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| | - Nancy R Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
31
|
Forni MF, Pizzurro GA, Krause W, Alexander AF, Bridges K, Xu Y, Justynski O, Gabry A, Camara NOS, Miller-Jensen K, Horsley V. Multiomics reveals age-dependent metabolic reprogramming of macrophages by wound bed niche secreted signals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.30.621159. [PMID: 39553941 PMCID: PMC11565841 DOI: 10.1101/2024.10.30.621159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
The cellular metabolism of macrophages depends on tissue niches and can control macrophage inflammatory or resolving phenotypes. Yet, the identity of signals within tissue niches that control macrophage metabolism is not well understood. Here, using single-cell RNA sequencing of macrophages in early mouse wounds, we find that, rather than gene expression of canonical inflammatory or resolving polarization markers, metabolic gene expression defines distinct populations of early wound macrophages. Single-cell secretomics and transcriptomics identify inflammatory and resolving cytokines expressed by early wound macrophages, and we show that these signals drive metabolic inputs and mitochondrial metabolism in an age-dependent manner. We show that aging alters the metabolome of early wound macrophages and rewires their metabolism from mitochondria to glycolysis. We further show that macrophage-derived Chi3l3 and IGF-1 can induce metabolic inputs and mitochondrial mass/metabolism in aged and bone marrow-derived macrophages. Together, these findings reveal that macrophage-derived signals drive the mitochondrial metabolism of macrophages within early wounds in an age-dependent manner and have implications for inflammatory diseases, chronic injuries, and age-related inflammatory diseases. In Brief This study reveals that macrophage subsets in early inflammatory stages of skin wound healing are defined by their metabolic profiles rather than polarization phenotype. Using single-cell secretomics, we establish key macrophage cytokines that comprise the in vivo wound niche and drive mitochondrial-based metabolism. Aging significantly alters macrophage heterogeneity and increases glycolytic metabolism, which can be restored to OxPHOS-based metabolism with young niche cytokines. These findings highlight the importance of the tissue niche in driving macrophage phenotypes, with implications for aging-related impairments in wound healing. Highlights Single cell transcriptional analysis reveals that reveals that metabolic gene expression identifies distinct macrophage populations in early skin wounds.Single-cell secretomic data show that young macrophages contribute to the wound bed niche by secreting molecules such as IGF-1 and Chi3l3.Old wound macrophages display altered metabolomics, elevated glycolytic metabolism and glucose uptake, and reduced lipid uptake and mitochondrial mass/metabolism.Chi3l3 but not IGF-1 secretion is altered in macrophages in an age dependent manner.Chi3l3 can restore mitochondrial mass/metabolism in aged macrophages.
Collapse
|
32
|
Hu Y, Wan S, Luo Y, Li Y, Wu T, Deng W, Jiang C, Jiang S, Zhang Y, Liu N, Yang Z, Chen F, Li B, Qu K. Benchmarking algorithms for single-cell multi-omics prediction and integration. Nat Methods 2024; 21:2182-2194. [PMID: 39322753 DOI: 10.1038/s41592-024-02429-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 08/19/2024] [Indexed: 09/27/2024]
Abstract
The development of single-cell multi-omics technology has greatly enhanced our understanding of biology, and in parallel, numerous algorithms have been proposed to predict the protein abundance and/or chromatin accessibility of cells from single-cell transcriptomic information and to integrate various types of single-cell multi-omics data. However, few studies have systematically compared and evaluated the performance of these algorithms. Here, we present a benchmark study of 14 protein abundance/chromatin accessibility prediction algorithms and 18 single-cell multi-omics integration algorithms using 47 single-cell multi-omics datasets. Our benchmark study showed overall totalVI and scArches outperformed the other algorithms for predicting protein abundance, and LS_Lab was the top-performing algorithm for the prediction of chromatin accessibility in most cases. Seurat, MOJITOO and scAI emerge as leading algorithms for vertical integration, whereas totalVI and UINMF excel beyond their counterparts in both horizontal and mosaic integration scenarios. Additionally, we provide a pipeline to assist researchers in selecting the optimal multi-omics prediction and integration algorithm.
Collapse
Affiliation(s)
- Yinlei Hu
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Mathematical Science, University of Science and Technology of China, Hefei, China
| | - Siyuan Wan
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Yuanhanyu Luo
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China
- National Institute of Biological Sciences, Beijing, China
| | - Yuanzhe Li
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Tong Wu
- National Institute of Biological Sciences, Beijing, China
- College of Life Sciences, Beijing Normal University, Beijing, China
| | - Wentao Deng
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Chen Jiang
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Shan Jiang
- National Institute of Biological Sciences, Beijing, China
| | - Yueping Zhang
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
| | - Nianping Liu
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China
| | - Zongcheng Yang
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Falai Chen
- School of Mathematical Science, University of Science and Technology of China, Hefei, China.
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China.
| | - Bin Li
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China.
- National Institute of Biological Sciences, Beijing, China.
| | - Kun Qu
- Department of Oncology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China.
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China.
- School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China.
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China.
| |
Collapse
|
33
|
Haussmann AJ, McMahan ZH, Volkmann ER. Understanding the gastrointestinal microbiome in systemic sclerosis: methodological advancements and emerging research. Curr Opin Rheumatol 2024; 36:401-409. [PMID: 39189041 PMCID: PMC11588518 DOI: 10.1097/bor.0000000000001048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/28/2024]
Abstract
PURPOSE OF REVIEW This review highlights the role of the gastrointestinal (GI) microbiome in systemic sclerosis (SSc). We describe techniques for evaluating the GI microbiome in humans, and emerging research linking GI microbiome alterations (i.e., dysbiosis) and distinct SSc clinical manifestations. We also address the evolving treatment landscape targeting dysbiosis in SSc. RECENT FINDINGS Recent literature brings into focus the complex relationship between the GI microbiome and SSc pathogenesis. Advanced techniques (e.g., shotgun metagenomics, meta-transcriptomics) provide deeper insights into microbial taxonomy and active gene expression, exposing dysbiosis as a potential driver of SSc. New studies demonstrate that SSc patients who possess specific SSc clinical features, (e.g., interstitial lung disease), have unique GI microbiome profiles. SUMMARY Dysbiosis is associated with specific clinical features in patients with SSc. New tools for studying the GI microbiome have furthered our understanding of the relationship between dysbiosis and SSc complications. Therapeutic avenues such as dietary adjustments, probiotics, antibiotics, mindfulness practices, and fecal transplants offer potential for managing SSc and preventing its progression through GI microbiome modulation. By clarifying what is known about the relationship between the GI dysbiosis, GI dysfunction, and SSc, this review enhances our understanding of SSc pathogenesis and proposes targeted interventions.
Collapse
Affiliation(s)
- Alana J. Haussmann
- Department of Medicine, University of California, Los Angeles, David Geffen School of Medicine; USA
| | - Zsuzsanna H. McMahan
- Department of Medicine, The University of Texas Health Science Center at Houston; USA
| | - Elizabeth R. Volkmann
- Department of Medicine, University of California, Los Angeles, David Geffen School of Medicine; USA
| |
Collapse
|
34
|
Polimeni B, Marasca F, Ranzani V, Bodega B. IRescue: uncertainty-aware quantification of transposable elements expression at single cell level. Nucleic Acids Res 2024; 52:e93. [PMID: 39271103 PMCID: PMC11514465 DOI: 10.1093/nar/gkae793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 08/22/2024] [Accepted: 09/02/2024] [Indexed: 09/15/2024] Open
Abstract
Transposable elements (TEs) are mobile DNA repeats known to shape the evolution of eukaryotic genomes. In complex organisms, they exhibit tissue-specific transcription. However, understanding their role in cellular diversity across most tissues remains a challenge, when employing single-cell RNA sequencing (scRNA-seq), due to their widespread presence and genetic similarity. To address this, we present IRescue (Interspersed Repeats single-cell quantifier), a software capable of estimating the expression of TE subfamilies at the single-cell level. IRescue incorporates a unique UMI deduplication algorithm to rectify sequencing errors and employs an Expectation-Maximization procedure to effectively redistribute the counts of multi-mapping reads. Our study showcases the precision of IRescue through analysis of both simulated and real single cell and nuclei RNA-seq data from human colorectal cancer, brain, skin aging, and PBMCs during SARS-CoV-2 infection and recovery. By linking the expression patterns of TE signatures to specific conditions and biological contexts, we unveil insights into their potential roles in cellular heterogeneity and disease progression.
Collapse
Affiliation(s)
- Benedetto Polimeni
- INGM, Istituto Nazionale di Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| | - Federica Marasca
- INGM, Istituto Nazionale di Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Milan, Italy
| | - Valeria Ranzani
- INGM, Istituto Nazionale di Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Milan, Italy
| | - Beatrice Bodega
- INGM, Istituto Nazionale di Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Milan, Italy
- Department of Biosciences, University of Milan, Milan, Italy
| |
Collapse
|
35
|
Yu Y, Mai Y, Zheng Y, Shi L. Assessing and mitigating batch effects in large-scale omics studies. Genome Biol 2024; 25:254. [PMID: 39363244 PMCID: PMC11447944 DOI: 10.1186/s13059-024-03401-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/23/2024] [Indexed: 10/05/2024] Open
Abstract
Batch effects in omics data are notoriously common technical variations unrelated to study objectives, and may result in misleading outcomes if uncorrected, or hinder biomedical discovery if over-corrected. Assessing and mitigating batch effects is crucial for ensuring the reliability and reproducibility of omics data and minimizing the impact of technical variations on biological interpretation. In this review, we highlight the profound negative impact of batch effects and the urgent need to address this challenging problem in large-scale omics studies. We summarize potential sources of batch effects, current progress in evaluating and correcting them, and consortium efforts aiming to tackle them.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
- Cancer Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
36
|
He Z, Hu S, Chen Y, An S, Zhou J, Liu R, Shi J, Wang J, Dong G, Shi J, Zhao J, Ou-Yang L, Zhu Y, Bo X, Ying X. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol 2024; 42:1594-1605. [PMID: 38263515 PMCID: PMC11471558 DOI: 10.1038/s41587-023-02040-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 10/23/2023] [Indexed: 01/25/2024]
Abstract
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Collapse
Affiliation(s)
- Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiahao Zhou
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Junfeng Shi
- School of Automation, China University of Geosciences, Wuhan, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiaxin Zhao
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China.
| |
Collapse
|
37
|
Han L, Ji Y, Yu Y, Ni Y, Zeng H, Zhang X, Liu H, Zhang Y. Trajectory-centric framework TrajAtlas reveals multi-scale differentiation heterogeneity among cells, genes, and gene modules in osteogenesis. PLoS Genet 2024; 20:e1011319. [PMID: 39436962 PMCID: PMC11530032 DOI: 10.1371/journal.pgen.1011319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 11/01/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
Osteoblasts, the key cells responsible for bone formation and the maintenance of skeletal integrity, originate from a diverse array of progenitor cells. However, the mechanisms underlying osteoblast differentiation from these multiple osteoprogenitors remain poorly understood. To address this knowledge gap, we developed a comprehensive framework to investigate osteoblast differentiation at multiple scales, encompassing cells, genes, and gene modules. We constructed a reference atlas focused on differentiation, which incorporates various osteoprogenitors and provides a seven-level cellular taxonomy. To reconstruct the differentiation process, we developed a model that identifies the transcription factors and pathways involved in differentiation from different osteoprogenitors. Acknowledging that covariates such as age and tissue type can influence differentiation, we created an algorithm to detect differentially expressed genes throughout the differentiation process. Additionally, we implemented methods to identify conserved pseudotemporal gene modules across multiple samples. Overall, our framework systematically addresses the heterogeneity observed during osteoblast differentiation from diverse sources, offering novel insights into the complexities of bone formation and serving as a valuable resource for understanding osteogenesis.
Collapse
Affiliation(s)
- Litian Han
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, Hubei Province, China
| | - Yaoting Ji
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, Hubei Province, China
| | - Yiqian Yu
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, Hubei Province, China
| | - Yueqi Ni
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, Hubei Province, China
| | - Hao Zeng
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, Hubei Province, China
| | - Xiaoxin Zhang
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, Hubei Province, China
| | - Huan Liu
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, Hubei Province, China
- Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, Hubei Province, China
- TaiKang Center for Life and Medical Sciences, Wuhan University, Wuhan, Hubei Province, China
| | - Yufeng Zhang
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan, Hubei Province, China
- Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, Hubei Province, China
- TaiKang Center for Life and Medical Sciences, Wuhan University, Wuhan, Hubei Province, China
| |
Collapse
|
38
|
Wu X, Yang X, Dai Y, Zhao Z, Zhu J, Guo H, Yang R. Single-cell sequencing to multi-omics: technologies and applications. Biomark Res 2024; 12:110. [PMID: 39334490 PMCID: PMC11438019 DOI: 10.1186/s40364-024-00643-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 08/17/2024] [Indexed: 09/30/2024] Open
Abstract
Cells, as the fundamental units of life, contain multidimensional spatiotemporal information. Single-cell RNA sequencing (scRNA-seq) is revolutionizing biomedical science by analyzing cellular state and intercellular heterogeneity. Undoubtedly, single-cell transcriptomics has emerged as one of the most vibrant research fields today. With the optimization and innovation of single-cell sequencing technologies, the intricate multidimensional details concealed within cells are gradually unveiled. The combination of scRNA-seq and other multi-omics is at the forefront of the single-cell field. This involves simultaneously measuring various omics data within individual cells, expanding our understanding across a broader spectrum of dimensions. Single-cell multi-omics precisely captures the multidimensional aspects of single-cell transcriptomes, immune repertoire, spatial information, temporal information, epitopes, and other omics in diverse spatiotemporal contexts. In addition to depicting the cell atlas of normal or diseased tissues, it also provides a cornerstone for studying cell differentiation and development patterns, disease heterogeneity, drug resistance mechanisms, and treatment strategies. Herein, we review traditional single-cell sequencing technologies and outline the latest advancements in single-cell multi-omics. We summarize the current status and challenges of applying single-cell multi-omics technologies to biological research and clinical applications. Finally, we discuss the limitations and challenges of single-cell multi-omics and potential strategies to address them.
Collapse
Affiliation(s)
- Xiangyu Wu
- Department of Urology, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, 321 Zhongshan Road, Nanjing, 210008, Jiangsu, China
| | - Xin Yang
- Department of Urology, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, 321 Zhongshan Road, Nanjing, 210008, Jiangsu, China
| | - Yunhan Dai
- Medical School, Nanjing University, Nanjing, China
| | - Zihan Zhao
- Department of Urology, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, 321 Zhongshan Road, Nanjing, 210008, Jiangsu, China
| | - Junmeng Zhu
- Department of Oncology, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, China
| | - Hongqian Guo
- Department of Urology, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, 321 Zhongshan Road, Nanjing, 210008, Jiangsu, China.
| | - Rong Yang
- Department of Urology, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, 321 Zhongshan Road, Nanjing, 210008, Jiangsu, China.
| |
Collapse
|
39
|
Hui HWH, Kong W, Goh WWB. Thinking points for effective batch correction on biomedical data. Brief Bioinform 2024; 25:bbae515. [PMID: 39397427 PMCID: PMC11471903 DOI: 10.1093/bib/bbae515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 09/11/2024] [Accepted: 10/01/2024] [Indexed: 10/15/2024] Open
Abstract
Batch effects introduce significant variability into high-dimensional data, complicating accurate analysis and leading to potentially misleading conclusions if not adequately addressed. Despite technological and algorithmic advancements in biomedical research, effectively managing batch effects remains a complex challenge requiring comprehensive considerations. This paper underscores the necessity of a flexible and holistic approach for selecting batch effect correction algorithms (BECAs), advocating for proper BECA evaluations and consideration of artificial intelligence-based strategies. We also discuss key challenges in batch effect correction, including the importance of uncovering hidden batch factors and understanding the impact of design imbalance, missing values, and aggressive correction. Our aim is to provide researchers with a robust framework for effective batch effects management and enhancing the reliability of high-dimensional data analyses.
Collapse
Affiliation(s)
- Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, 59 Nanyang Drive, Singapore 636921, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 59 Nanyang Dr, Singapore 636921, Singapore
- Center of AI in Medicine, Nanyang Technological University, 59 Nanyang Dr, Singapore 636921, Singapore
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, Burlington Danes, The Hammersmith Hospital, Du Cane Road, London W12 0NN, United Kingdom
| |
Collapse
|
40
|
Hastings J, Lee D, O’Connell MJ. Batch-effect correction in single-cell RNA sequencing data using JIVE. BIOINFORMATICS ADVANCES 2024; 4:vbae134. [PMID: 39387061 PMCID: PMC11461915 DOI: 10.1093/bioadv/vbae134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 07/17/2024] [Accepted: 09/11/2024] [Indexed: 10/12/2024]
Abstract
Motivation In single-cell RNA sequencing analysis, addressing batch effects-technical artifacts stemming from factors such as varying sequencing technologies, equipment, and capture times-is crucial. These factors can cause unwanted variation and obfuscate the underlying biological signal of interest. The joint and individual variation explained (JIVE) method can be used to extract shared biological patterns from multi-source sequencing data while adjusting for individual non-biological variations (i.e. batch effect). However, its current implementation is originally designed for bulk sequencing data, making it computationally infeasible for large-scale single-cell sequencing datasets. Results In this study, we enhance JIVE for large-scale single-cell data by boosting its computational efficiency. Additionally, we introduce a novel application of JIVE for batch-effect correction on multiple single-cell sequencing datasets. Our enhanced method aims to decompose single-cell sequencing datasets into a joint structure capturing the true biological variability and individual structures, which capture technical variability within each batch. This joint structure is then suitable for use in downstream analyses. We benchmarked the results against four popular tools, Seurat v5, Harmony, LIGER, and Combat-seq, which were developed for this purpose. JIVE performed best in terms of preserving cell-type effects and in scenarios in which the batch sizes are balanced. Availability and implementation The JIVE implementation used for this analysis can be found at https://github.com/oconnell-statistics-lab/scJIVE.
Collapse
Affiliation(s)
- Joseph Hastings
- Department of Statistics, Miami University, Oxford, OH 45056, United States
| | - Donghyung Lee
- Department of Statistics, Miami University, Oxford, OH 45056, United States
| | | |
Collapse
|
41
|
Safina K, van Galen P. New frameworks for hematopoiesis derived from single-cell genomics. Blood 2024; 144:1039-1047. [PMID: 38985829 PMCID: PMC11561540 DOI: 10.1182/blood.2024024006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 06/21/2024] [Accepted: 06/22/2024] [Indexed: 07/12/2024] Open
Abstract
ABSTRACT Recent advancements in single-cell genomics have enriched our understanding of hematopoiesis, providing intricate details about hematopoietic stem cell biology, differentiation, and lineage commitment. Technological advancements have highlighted extensive heterogeneity of cell populations and continuity of differentiation routes. Nevertheless, intermediate "attractor" states signify structure in stem and progenitor populations that link state transition dynamics to fate potential. We discuss how innovative model systems quantify lineage bias and how stress accelerates differentiation, thereby reducing fate plasticity compared with native hematopoiesis. We conclude by offering our perspective on the current model of hematopoiesis and discuss how a more precise understanding can translate to strategies that extend healthy hematopoiesis and prevent disease.
Collapse
Affiliation(s)
- Ksenia Safina
- Division of Hematology, Brigham and Women’s Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
- Ludwig Center at Harvard, Boston, MA
| | - Peter van Galen
- Division of Hematology, Brigham and Women’s Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
- Ludwig Center at Harvard, Boston, MA
| |
Collapse
|
42
|
Saager ES, van Stigt AH, Lerkvaleekul B, Lutter L, Hellinga AH, van der Wal MM, Bont LJ, Leusen JH, van’t Land B, van Wijk F, the Protection against Respiratory tract infections through human Milk Analysis (PRIMA) group. Human breastmilk memory T cells throughout lactation manifest activated tissue-oriented profile with prominent regulation. JCI Insight 2024; 9:e181788. [PMID: 39435660 PMCID: PMC11530127 DOI: 10.1172/jci.insight.181788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024] Open
Abstract
Breastfeeding provides important immunological benefits to the neonate, but how the different immunoactive components in breastmilk contribute to immunity remains poorly understood. Here, we characterized human breastmilk T cells using single-cell RNA-Seq and flow cytometry. Breastmilk contained predominantly memory T cells, with expression of immune signaling genes, high proliferation, and an effector Th1/cytotoxic profile with high cytokine production capacities. Elevated activation was balanced by an enriched Treg population and immune regulatory markers in conventional memory T cells. Gene and surface expression of tissue-residency markers indicate that breastmilk T cells represented tissue-adapted rather than circulatory T cells. In addition, breastmilk T cells had a broad homing profile and higher activation markers in these migratory subsets. The partly overlapping transcriptome profile between breastmilk and breast tissue T cells, particularly cytotoxic T cells, might support a role in local immune defense in the mammary gland. However, unique features of breastmilk, such as Tregs, might imply an additional role in neonatal immune support. We found some correlations between the breastmilk T cell profile and clinical parameters, most notably with maternal and household factors. Together, our data suggest that breastmilk contains an adapted T cell population that exerts their function in specific tissue sites.
Collapse
Affiliation(s)
- Elise S. Saager
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
| | - Arthur H. van Stigt
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
| | - Butstabong Lerkvaleekul
- Division of Rheumatology, Department of Pediatrics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Lisanne Lutter
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
- Department of Pathology, Amsterdam University Medical Centre, Amsterdam, Netherlands
| | - Anneke H. Hellinga
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
| | - M. Marlot van der Wal
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
| | - Louis J. Bont
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
- Department of Paediatric Immunology and Infectious Diseases, Wilhelmina Children’s Hospital/University Medical Center Utrecht, Utrecht, Netherlands
- ReSViNET foundation, Zeist, Netherlands
| | - Jeanette H.W. Leusen
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
| | - Belinda van’t Land
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
- CoE Immunology, Danone Global Research & Innovation Center, Utrecht, Netherlands
| | - Femke van Wijk
- Center for Translational Immunology, University Medical Centre Utrecht, Utrecht, Netherlands
| | | |
Collapse
|
43
|
Bizzarri D, Reinders MJT, Kuiper L, Beekman M, Deelen J, van Meurs JBJ, van Dongen J, Pool R, Boomsma DI, Ghanbari M, Franke L, Slagboom PE, van den Akker EB. NMR metabolomics-guided DNA methylation mortality predictors. EBioMedicine 2024; 107:105279. [PMID: 39154540 PMCID: PMC11378104 DOI: 10.1016/j.ebiom.2024.105279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 07/25/2024] [Accepted: 07/29/2024] [Indexed: 08/20/2024] Open
Abstract
BACKGROUND 1H-NMR metabolomics and DNA methylation in blood are widely known biomarkers predicting age-related physiological decline and mortality yet exert mutually independent mortality and frailty signals. METHODS Leveraging multi-omics data in four Dutch population studies (N = 5238, ∼40% of which male) we investigated whether the mortality signal captured by 1H-NMR metabolomics could guide the construction of DNA methylation-based mortality predictors. FINDINGS We trained DNA methylation-based surrogates for 64 metabolomic analytes and found that analytes marking inflammation, fluid balance, or HDL/VLDL metabolism could be accurately reconstructed using DNA-methylation assays. Interestingly, a previously reported multi-analyte score indicating mortality risk (MetaboHealth) could also be accurately reconstructed. Sixteen of our derived surrogates, including the MetaboHealth surrogate, showed significant associations with mortality, independent of relevant covariates. INTERPRETATION The addition of our metabolic analyte-derived surrogates to the well-established epigenetic clock GrimAge demonstrates that our surrogates potentially represent valuable mortality signal. FUNDING BBMRI-NL, X-omics, VOILA, Medical Delta, NWO, ERC.
Collapse
Affiliation(s)
- Daniele Bizzarri
- Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; Leiden Computational Biology Center, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; Delft Bioinformatics Lab, TU Delft, Delft, the Netherlands
| | - Marcel J T Reinders
- Leiden Computational Biology Center, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; Delft Bioinformatics Lab, TU Delft, Delft, the Netherlands
| | - Lieke Kuiper
- Department of Internal Medicine, Erasmus MC, Rotterdam, the Netherlands; Center for Nutrition, Prevention and Health Services, National Institute for Public Health and Environment (RIVM), Bilthoven, the Netherlands
| | - Marian Beekman
- Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - Joris Deelen
- Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; Max Planck Institute for the Biology of Ageing, Cologne, Germany; Cologne Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases, University of Cologne, Cologne, Germany
| | - Joyce B J van Meurs
- Department of Internal Medicine, Erasmus MC, Rotterdam, the Netherlands; Department of Orthopaedics & Sports, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Amsterdam Reproduction and Development (AR&D) Research Institute, Amsterdam, the Netherlands; Amsterdam Public Health Research Institute, Amsterdam, the Netherlands
| | - René Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Amsterdam Public Health Research Institute, Amsterdam, the Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Amsterdam Reproduction and Development (AR&D) Research Institute, Amsterdam, the Netherlands; Amsterdam Public Health Research Institute, Amsterdam, the Netherlands
| | - Mohsen Ghanbari
- Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands
| | - Lude Franke
- Department of Genetics, University Medical Center Groningen, Groningen, the Netherlands
| | - Pieternella E Slagboom
- Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; Max Planck Institute for the Biology of Ageing, Cologne, Germany
| | - Erik B van den Akker
- Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; Leiden Computational Biology Center, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; Delft Bioinformatics Lab, TU Delft, Delft, the Netherlands.
| |
Collapse
|
44
|
Chen X, Huang Y, Huang L, Huang Z, Hao ZZ, Xu L, Xu N, Li Z, Mou Y, Ye M, You R, Zhang X, Liu S, Miao Z. A brain cell atlas integrating single-cell transcriptomes across human brain regions. Nat Med 2024; 30:2679-2691. [PMID: 39095595 PMCID: PMC11405287 DOI: 10.1038/s41591-024-03150-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 06/24/2024] [Indexed: 08/04/2024]
Abstract
While single-cell technologies have greatly advanced our comprehension of human brain cell types and functions, studies including large numbers of donors and multiple brain regions are needed to extend our understanding of brain cell heterogeneity. Integrating atlas-level single-cell data presents a chance to reveal rare cell types and cellular heterogeneity across brain regions. Here we present the Brain Cell Atlas, a comprehensive reference atlas of brain cells, by assembling single-cell data from 70 human and 103 mouse studies of the brain throughout major developmental stages across brain regions, covering over 26.3 million cells or nuclei from both healthy and diseased tissues. Using machine-learning based algorithms, the Brain Cell Atlas provides a consensus cell type annotation, and it showcases the identification of putative neural progenitor cells and a cell subpopulation of PCDH9high microglia in the human brain. We demonstrate the gene regulatory difference of PCDH9high microglia between hippocampus and prefrontal cortex and elucidate the cell-cell communication network. The Brain Cell Atlas presents an atlas-level integrative resource for comparing brain cells in different environments and conditions within the Human Cell Atlas.
Collapse
Affiliation(s)
- Xinyue Chen
- Guangzhou National Laboratory, Guangzhou International Bio Island, Guangzhou, China
| | - Yin Huang
- Guangzhou National Laboratory, Guangzhou International Bio Island, Guangzhou, China
| | - Liangfeng Huang
- Guangzhou National Laboratory, Guangzhou International Bio Island, Guangzhou, China
| | - Ziliang Huang
- Guangzhou National Laboratory, Guangzhou International Bio Island, Guangzhou, China
| | - Zhao-Zhe Hao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China
| | - Lahong Xu
- Guangzhou National Laboratory, Guangzhou International Bio Island, Guangzhou, China
| | - Nana Xu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China
| | - Zhi Li
- Department of Neurosurgery/Neuro-oncology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Yonggao Mou
- Department of Neurosurgery/Neuro-oncology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Mingli Ye
- Tsinghua Fuzhou Institute for Data Technology, Fuzhou, China
| | - Renke You
- Tsinghua Fuzhou Institute for Data Technology, Fuzhou, China
| | - Xuegong Zhang
- MOE Key Lab of Bioinformatics, Bioinformatics Division of BNRIST and Department of Automation, Tsinghua University, Beijing, China
- School of Medicine, Tsinghua University, Beijing, China
- School of Life Sciences, Center for Synthetic and Systems Biology, Tsinghua University, Beijing, China
| | - Sheng Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China.
- Guangdong Province Key Laboratory of Brain Function and Disease, Guangzhou, China.
| | - Zhichao Miao
- Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai, China.
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou National Laboratory, Guangzhou Medical University, Guangzhou International Bio Island, Guangzhou, China.
- Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou International Bio Island, Guangzhou, China.
| |
Collapse
|
45
|
Chang Y, Liu J, Jiang Y, Ma A, Yeo YY, Guo Q, McNutt M, Krull JE, Rodig SJ, Barouch DH, Nolan GP, Xu D, Jiang S, Li Z, Liu B, Ma Q. Graph Fourier transform for spatial omics representation and analyses of complex organs. Nat Commun 2024; 15:7467. [PMID: 39209833 PMCID: PMC11362340 DOI: 10.1038/s41467-024-51590-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024] Open
Abstract
Spatial omics technologies decipher functional components of complex organs at cellular and subcellular resolutions. We introduce Spatial Graph Fourier Transform (SpaGFT) and apply graph signal processing to a wide range of spatial omics profiling platforms to generate their interpretable representations. This representation supports spatially variable gene identification and improves gene expression imputation, outperforming existing tools in analyzing human and mouse spatial transcriptomics data. SpaGFT can identify immunological regions for B cell maturation in human lymph nodes Visium data and characterize variations in secondary follicles using in-house human tonsil CODEX data. Furthermore, it can be integrated seamlessly into other machine learning frameworks, enhancing accuracy in spatial domain identification, cell type annotation, and subcellular feature inference by up to 40%. Notably, SpaGFT detects rare subcellular organelles, such as Cajal bodies and Set1/COMPASS complexes, in high-resolution spatial proteomics data. This approach provides an explainable graph representation method for exploring tissue biology and function.
Collapse
Affiliation(s)
- Yuzhou Chang
- Department of Biomedical Informatics, College of Medicine, Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Jixin Liu
- School of Mathematics, Shandong University, 250100, Jinan, China
| | - Yi Jiang
- Department of Biomedical Informatics, College of Medicine, Ohio State University, Columbus, OH, 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Yao Yu Yeo
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA, 02115, USA
- Program in Virology, Division of Medical Sciences, Harvard Medical School, Boston, MA, 20115, USA
| | - Qi Guo
- Department of Biomedical Informatics, College of Medicine, Ohio State University, Columbus, OH, 43210, USA
| | - Megan McNutt
- Department of Biomedical Informatics, College of Medicine, Ohio State University, Columbus, OH, 43210, USA
| | - Jordan E Krull
- Department of Biomedical Informatics, College of Medicine, Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Scott J Rodig
- Department of Pathology, Dana Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Pathology, Brigham & Women's Hospital, Boston, MA, 02115, USA
| | - Dan H Barouch
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA, 02115, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, 02139, USA
| | - Garry P Nolan
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Sizun Jiang
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA, 02115, USA
- Program in Virology, Division of Medical Sciences, Harvard Medical School, Boston, MA, 20115, USA
- Department of Pathology, Dana Farber Cancer Institute, Boston, MA, 02115, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, 250100, Jinan, China.
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, Ohio State University, Columbus, OH, 43210, USA.
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
46
|
Deng Y, Yao Y, Wang Y, Yu T, Cai W, Zhou D, Yin F, Liu W, Liu Y, Xie C, Guan J, Hu Y, Huang P, Li W. An end-to-end deep learning method for mass spectrometry data analysis to reveal disease-specific metabolic profiles. Nat Commun 2024; 15:7136. [PMID: 39164279 PMCID: PMC11335749 DOI: 10.1038/s41467-024-51433-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 08/07/2024] [Indexed: 08/22/2024] Open
Abstract
Untargeted metabolomic analysis using mass spectrometry provides comprehensive metabolic profiling, but its medical application faces challenges of complex data processing, high inter-batch variability, and unidentified metabolites. Here, we present DeepMSProfiler, an explainable deep-learning-based method, enabling end-to-end analysis on raw metabolic signals with output of high accuracy and reliability. Using cross-hospital 859 human serum samples from lung adenocarcinoma, benign lung nodules, and healthy individuals, DeepMSProfiler successfully differentiates the metabolomic profiles of different groups (AUC 0.99) and detects early-stage lung adenocarcinoma (accuracy 0.961). Model flow and ablation experiments demonstrate that DeepMSProfiler overcomes inter-hospital variability and effects of unknown metabolites signals. Our ensemble strategy removes background-category phenomena in multi-classification deep-learning models, and the novel interpretability enables direct access to disease-related metabolite-protein networks. Further applying to lipid metabolomic data unveils correlations of important metabolites and proteins. Overall, DeepMSProfiler offers a straightforward and reliable method for disease diagnosis and mechanism discovery, enhancing its broad applicability.
Collapse
Affiliation(s)
- Yongjie Deng
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yao Yao
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
- Metabolic Innovation Platform, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yanni Wang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Tiantian Yu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
- Metabolic Innovation Platform, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Wenhao Cai
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Dingli Zhou
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Feng Yin
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Wanli Liu
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Yuying Liu
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Chuanbo Xie
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Jian Guan
- Department of Radiology, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Yumin Hu
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China.
- Metabolic Innovation Platform, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
| | - Peng Huang
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China.
- Metabolic Innovation Platform, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.
- Sun Yat-Sen University School of Medicine, Sun Yat-Sen University, Shenzhen, China.
- Key Laboratory of Tropical Disease Control of Ministry of Education, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
47
|
Dee W, Sequeira I, Lobley A, Slabaugh G. Cell-vision fusion: A Swin transformer-based approach for predicting kinase inhibitor mechanism of action from Cell Painting data. iScience 2024; 27:110511. [PMID: 39175778 PMCID: PMC11340608 DOI: 10.1016/j.isci.2024.110511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 04/08/2024] [Accepted: 07/11/2024] [Indexed: 08/24/2024] Open
Abstract
Image-based profiling of the cellular response to drug compounds has proven effective at characterizing the morphological changes resulting from perturbation experiments. As data availability increases, however, there are growing demands for novel deep-learning methods. We applied the SwinV2 computer vision architecture to predict the mechanism of action of 10 kinase inhibitor compounds directly from Cell Painting images. This method outperforms the standard approach of using image-based profiles (IBP)-multidimensional feature set representations generated by bioimaging software. Furthermore, our fusion approach-cell-vision fusion, combining three different data modalities, images, IBPs, and chemical structures-achieved 69.79% accuracy and 70.56% F1 score, 4.20% and 5.49% higher, respectively, than the best-performing IBP method. We provide three techniques, specific to Cell Painting images, which enable deep-learning architectures to train effectively and demonstrate approaches to combat the significant batch effects present in large Cell Painting datasets.
Collapse
Affiliation(s)
- William Dee
- Digital Environment Research Institute (DERI), Queen Mary University of London, London E1 1HH, UK
- Centre for Oral Immunobiology and Regenerative Medicine, Barts Centre for Squamous Cancer, Institute of Dentistry, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AD, UK
- Exscientia Plc, The Schrödinger Building Oxford Science Park, Oxford OX4 4GE, UK
| | - Ines Sequeira
- Centre for Oral Immunobiology and Regenerative Medicine, Barts Centre for Squamous Cancer, Institute of Dentistry, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AD, UK
| | - Anna Lobley
- Exscientia Plc, The Schrödinger Building Oxford Science Park, Oxford OX4 4GE, UK
| | - Gregory Slabaugh
- Digital Environment Research Institute (DERI), Queen Mary University of London, London E1 1HH, UK
| |
Collapse
|
48
|
Liu J, Ma J, Wen J, Zhou X. A Cell Cycle-Aware Network for Data Integration and Label Transferring of Single-Cell RNA-Seq and ATAC-Seq. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2401815. [PMID: 38887194 PMCID: PMC11336957 DOI: 10.1002/advs.202401815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/22/2024] [Indexed: 06/20/2024]
Abstract
In recent years, the integration of single-cell multi-omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non-single omics perspective, but it still suffers many challenges, such as omics-variance, sparsity, cell heterogeneity, and confounding factors. As it is known, the cell cycle is regarded as a confounder when analyzing other factors in single-cell RNA-seq data, but it is not clear how it will work on the integrated single-cell multi-omics data. Here, a cell cycle-aware network (CCAN) is developed to remove cell cycle effects from the integrated single-cell multi-omics data while keeping the cell type-specific variations. This is the first computational model to study the cell-cycle effects in the integration of single-cell multi-omics data. Validations on several benchmark datasets show the outstanding performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA-seq datasets from different protocols, integrating paired and unpaired scRNA-seq and scATAC-seq data, accurately transferring cell type labels from scRNA-seq to scATAC-seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data.
Collapse
Affiliation(s)
- Jiajia Liu
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| | - Jian Ma
- Department of Electronic Information and Computer EngineeringThe Engineering & Technical College of Chengdu University of TechnologyLeshanSichuan614000China
| | - Jianguo Wen
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| | - Xiaobo Zhou
- Center for Computational Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTX77030USA
- McGovern Medical SchoolThe University of Texas Health Science Center at HoustonHoustonTX77030USA
- School of DentistryThe University of Texas Health Science Center at HoustonHoustonTX77030USA
| |
Collapse
|
49
|
Jia R, Li Z, Hu S, Chang H, Zeng M, Liu P, Lu L, Xu M, Zhai X, Qian M, Xu J. Immunological characterization and comparison of children with COVID-19 from their adult counterparts at single-cell resolution. Front Immunol 2024; 15:1358725. [PMID: 39148728 PMCID: PMC11325098 DOI: 10.3389/fimmu.2024.1358725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 07/17/2024] [Indexed: 08/17/2024] Open
Abstract
Introduction The immunological characteristics that could protect children with coronavirus disease 2019 (COVID-19) from severe or fatal illnesses have not been fully understood yet. Methods Here, we performed single-cell RNA sequencing (scRNA-seq) analysis on peripheral blood samples of 15 children (8 with COVID-19) and compared them to 18 adults (13 with COVID-19). Results The child-adult integrated single cell data indicated that children with the disease presented a restrained response to type I interferon in most of the major immune cell types, along with suppression of upstream interferon regulatory factor and toll-like receptor expression in monocytes, which was confirmed by in vitro interferon stimulation assays. Unlike adult patients, children with COVID-19 showed lower frequencies of activated proinflammatory CD14+ monocytes, possibly explaining the rareness of cytokine storm in them. Notably, natural killer (NK) cells in pediatric patients displayed potent cytotoxicity with a rich expression of cytotoxic molecules and upregulated cytotoxic pathways, whereas the cellular senescence, along with the Notch signaling pathway, was significantly downregulated in NK cells, all suggesting more robust cytotoxicity in NK cells of children than adult patients that was further confirmed by CD107a degranulation assays. Lastly, a modest adaptive immune response was evident with more naïve T cells but less activated and proliferated T cells while less naïve B cells but more activated B cells in children over adult patients. Conclusion Conclusively, this preliminary study revealed distinct cell frequency and activation status of major immune cell types, particularly more robust NK cell cytotoxicity in PBMC that might help protect children from severe COVID-19.
Collapse
Affiliation(s)
- Ran Jia
- Department of Clinical Laboratory, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Zifeng Li
- Department of Hematology and Oncology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Shiwen Hu
- Department of Hematology and Oncology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Hailing Chang
- Department of Infectious Diseases, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Mei Zeng
- Department of Infectious Diseases, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Pengcheng Liu
- Department of Clinical Laboratory, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Lijuan Lu
- Department of Clinical Laboratory, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Menghua Xu
- Department of Clinical Laboratory, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Xiaowen Zhai
- Department of Hematology and Oncology, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
| | - Maoxiang Qian
- Institute of Pediatrics and Department of Hematology and Oncology, Children's Hospital of Fudan University, National Children's Medical Center, and the Shanghai Key Laboratory of Medical Epigenetics, International Co-laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Jin Xu
- Department of Clinical Laboratory, Children's Hospital of Fudan University, National Children's Medical Center, Shanghai, China
- Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China
| |
Collapse
|
50
|
Hu F, Lucas A, Chen AA, Coleman K, Horng H, Ng RWS, Tustison NJ, Davis KA, Shou H, Li M, Shinohara RT, The Alzheimer's Disease Neuroimaging Initiative. DeepComBat: A statistically motivated, hyperparameter-robust, deep learning approach to harmonization of neuroimaging data. Hum Brain Mapp 2024; 45:e26708. [PMID: 39056477 PMCID: PMC11273293 DOI: 10.1002/hbm.26708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/19/2024] [Accepted: 04/25/2024] [Indexed: 07/28/2024] Open
Abstract
Neuroimaging data acquired using multiple scanners or protocols are increasingly available. However, such data exhibit technical artifacts across batches which introduce confounding and decrease reproducibility. This is especially true when multi-batch data are analyzed using complex downstream models which are more likely to pick up on and implicitly incorporate batch-related information. Previously proposed image harmonization methods have sought to remove these batch effects; however, batch effects remain detectable in the data after applying these methods. We present DeepComBat, a deep learning harmonization method based on a conditional variational autoencoder and the ComBat method. DeepComBat combines the strengths of statistical and deep learning methods in order to account for the multivariate relationships between features while simultaneously relaxing strong assumptions made by previous deep learning harmonization methods. As a result, DeepComBat can perform multivariate harmonization while preserving data structure and avoiding the introduction of synthetic artifacts. We apply this method to cortical thickness measurements from a cognitive-aging cohort and show DeepComBat qualitatively and quantitatively outperforms existing methods in removing batch effects while preserving biological heterogeneity. Additionally, DeepComBat provides a new perspective for statistically motivated deep learning harmonization methods.
Collapse
Affiliation(s)
- Fengling Hu
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Alfredo Lucas
- Center for Neuroengineering and Therapeutics, Department of EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Andrew A. Chen
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Kyle Coleman
- Statistical Center for Single‐Cell and Spatial GenomicsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Hannah Horng
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Raymond W. S. Ng
- Perelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Nicholas J. Tustison
- Department of Radiology and Medical ImagingUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Kathryn A. Davis
- Center for Neuroengineering and Therapeutics, Department of EngineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Department of NeurologyPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Center for Biomedical Image Computing and Analytics (CBICA)Perelman School of MedicinePhiladelphiaPennsylvaniaUSA
| | - Mingyao Li
- Statistical Center for Single‐Cell and Spatial GenomicsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Russell T. Shinohara
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and InformaticsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Center for Biomedical Image Computing and Analytics (CBICA)Perelman School of MedicinePhiladelphiaPennsylvaniaUSA
| | | |
Collapse
|