1
|
Zhou Y, Sheng Q, Jin S. Integrating single-cell data with biological variables. Proc Natl Acad Sci U S A 2025; 122:e2416516122. [PMID: 40294274 PMCID: PMC12067276 DOI: 10.1073/pnas.2416516122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 03/30/2025] [Indexed: 04/30/2025] Open
Abstract
Constructing single-cell atlases requires preserving differences attributable to biological variables, such as cell types, tissue origins, and disease states, while eliminating batch effects. However, existing methods are inadequate in explicitly modeling these biological variables. Here, we introduce SIGNAL, a general framework that leverages biological variables to disentangle biological and technical effects, thereby linking these metadata to data integration. SIGNAL employs a variant of principal component analysis to align multiple batches, enabling the integration of 1 million cells in approximately 2 min. SIGNAL, despite its computational simplicity, surpasses state-of-the-art methods across multiple integration scenarios: 1) heterogeneous datasets, 2) cross-species datasets, 3) simulated datasets, 4) integration on low-quality cell annotations, and 5) reference-based integration. Furthermore, we demonstrate that SIGNAL accurately transfers knowledge from reference to query datasets. Notably, we propose a self-adjustment strategy to restore annotated cell labels potentially distorted during integration. Finally, we apply SIGNAL to multiple large-scale atlases, including a human heart cell atlas containing 2.7 million cells, identifying tissue- and developmental stage-specific subtypes, as well as condition-specific cell states. This underscores SIGNAL's exceptional capability in multiscale analysis.
Collapse
Affiliation(s)
- Yang Zhou
- School of Mathematics, Harbin Institute of Technology, Harbin150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou450000, China
| | - Qiongyu Sheng
- School of Mathematics, Harbin Institute of Technology, Harbin150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou450000, China
| | - Shuilin Jin
- School of Mathematics, Harbin Institute of Technology, Harbin150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou450000, China
| |
Collapse
|
2
|
Gao L, Liu Y, Zou J, Deng F, Liu Z, Zhang Z, Zhao X, Chen L, Tong HHY, Ji Y, Le H, Zou X, Hao J. Deep scSTAR: leveraging deep learning for the extraction and enhancement of phenotype-associated features from single-cell RNA sequencing and spatial transcriptomics data. Brief Bioinform 2025; 26:bbaf160. [PMID: 40315434 PMCID: PMC12047704 DOI: 10.1093/bib/bbaf160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2024] [Revised: 02/28/2025] [Accepted: 03/19/2025] [Indexed: 05/04/2025] Open
Abstract
Single-cell sequencing has advanced our understanding of cellular heterogeneity and disease pathology, offering insights into cellular behavior and immune mechanisms. However, extracting meaningful phenotype-related features is challenging due to noise, batch effects, and irrelevant biological signals. To address this, we introduce Deep scSTAR (DscSTAR), a deep learning-based tool designed to enhance phenotype-associated features. DscSTAR identified HSP+ FKBP4+ T cells in CD8+ T cells, which linked to immune dysfunction and resistance to immune checkpoint blockade in non-small cell lung cancer. It has also enhanced spatial transcriptomics analysis of renal cell carcinoma, revealing interactions between cancer cells, CD8+ T cells, and tumor-associated macrophages that may promote immune suppression and affect outcomes. In hepatocellular carcinoma, it highlighted the role of S100A12+ neutrophils and cancer-associated fibroblasts in forming tumor immune barriers and potentially contributing to immunotherapy resistance. These findings demonstrate DscSTAR's capacity to model and extract phenotype-specific information, advancing our understanding of disease mechanisms and therapy resistance.
Collapse
Affiliation(s)
- Lianchong Gao
- Shanghai Center for Systems Biomedicine, Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Jiao Tong University, 800# Dong Chuan Road, Minhang District, Shanghai 200240, China
| | - Yujun Liu
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Fudan University, Shanghai 200433, China
| | - Jiawei Zou
- Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Fulan Deng
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR, China
| | - Zheqi Liu
- Department of Oral and Maxillofacial Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Zhen Zhang
- Department of Oral and Maxillofacial-Head and Neck Oncology, Ninth People’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200011, China
| | - Xinran Zhao
- Department of Oral and Maxillofacial-Head and Neck Oncology, Ninth People’s Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200011, China
| | - Lei Chen
- Renji Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai 200127, China
| | - Henry H Y Tong
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR, China
| | - Yuan Ji
- Molecular Pathology Center, Dept. Pathology, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Huangying Le
- Shanghai Center for Systems Biomedicine, Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Jiao Tong University, 800# Dong Chuan Road, Minhang District, Shanghai 200240, China
| | - Xin Zou
- Digital Diagnosis and Treatment Innovation Center for Cancer, Institute of Translational Medicine, Shanghai Jiao Tong University, 800# Dong Chuan Road, Shanghai 200240, China
| | - Jie Hao
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Chen Hua Road, Songjiang District, Shanghai 201602, China
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, No.180 Fenglin Road, Xuhui District, Shanghai 200032, China
| |
Collapse
|
3
|
Hu X, Li H, Chen M, Qian J, Jiang H. Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness. Commun Biol 2025; 8:521. [PMID: 40158033 PMCID: PMC11954866 DOI: 10.1038/s42003-025-07947-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Accepted: 03/18/2025] [Indexed: 04/01/2025] Open
Abstract
Batch effect correction (BEC) is fundamental to integrate multiple single-cell RNA sequencing datasets, and its success is critical to empower in-depth interrogation for biological insights. However, no simple metric is available to evaluate BEC performance with sensitivity to data overcorrection, which erases true biological variations and leads to false biological discoveries. Here, we propose RBET, a reference-informed statistical framework for evaluating the success of BEC. Using extensive simulations and six real data examples including scRNA-seq and scATAC-seq datasets with different numbers of batches, batch effect sizes and numbers of cell types, we demonstrate that RBET evaluates the performance of BEC methods more fairly with biologically meaningful insights from data, while other methods may lead to false results. Moreover, RBET is computationally efficient, sensitive to overcorrection and robust to large batch effect sizes. Thus, RBET provides a robust guideline on selecting case-specific BEC method, and the concept of RBET is extendable to other modalities.
Collapse
Affiliation(s)
- Xiaoyue Hu
- Center for Data Science, Zhejiang University, Hangzhou, China
- School of Mathematical Sciences, Zhejiang University, Hangzhou, China
| | - He Li
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Ming Chen
- College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Junbin Qian
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, China.
- Institute of Genetics, Zhejiang University School of Medicine, Hangzhou, China.
- Cancer Center, Zhejiang University, Hangzhou, China.
- Zhejiang Provincial Clinical Research Center for Child Health, Hangzhou, China.
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China.
| |
Collapse
|
4
|
Li F, Chen M, Zhang M, Chen S, Qu M, He S, Wang L, Wu X, Xiao G. Targeting Piezo1 channel to alleviate intervertebral disc degeneration. J Orthop Translat 2025; 51:145-158. [PMID: 40129609 PMCID: PMC11930658 DOI: 10.1016/j.jot.2025.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 01/07/2025] [Accepted: 01/10/2025] [Indexed: 03/26/2025] Open
Abstract
Background Low back pain impacts over 600 million people worldwide, predominantly due to intervertebral disc degeneration. This study focuses on the role of Piezo1, a crucial mechanosensitive ion channel protein, in the pathology and potential treatment of disc degeneration. Materials and methods To investigate the effects of disc-specific Piezo1 deletion, we generated Aggrecan CreERT2 ; Piezo1 fl/fl mice and examined both lumbar spine instability (LSI)- and aging-induced disc degeneration. Additionally, the effect of pharmacological inhibition of Piezo1 was evaluated using GsMTx4, a potent Piezo1 antagonist, in an ex vivo model stimulated with IL-1β to induce disc degeneration. Assessments included histological examinations, immunofluorescence, and western blot analyses to thoroughly characterize the alterations in the intervertebral discs. Results Elevated expression of Piezo1 was detected in the nucleus pulposus (NP) of intervertebral discs with advanced disc degeneration in both aged mice and human patients. Inducible deletion of Piezo1 expression in aggrecan-expressing disc cells significantly reduced lumbar disc degeneration, decreased extracellular matrix (ECM) degradation, and lowered apoptosis in NP cells, observed in both aged mice and those undergoing LSI surgery. Excessive compression loading (CL) upregulated Piezo1 expression, induced ECM disruption, and increased apoptosis in NP cells, whereas inhibition of Piezo1 with GsMTx4 effectively mitigated these pathological changes. Furthermore, in ex vivo cultured mouse discs, GsMTx4 treatment significantly alleviated IL-1β-induced degenerative damages, restored ECM anabolism, and reduced apoptosis. Conclusions The findings suggest that Piezo1 plays a critical role in the development of disc degeneration and highlight its potential as a therapeutic target. Inhibiting Piezo1 could offer a novel strategy for treating or preventing this critical disease. Translational potential of this article This research highlights the involvement of Piezo1 in the development of intervertebral disc degeneration and emphasizes the potential for targeting Piezo1 as a therapeutic strategy to delay or reverse this condition.
Collapse
Affiliation(s)
- Feiyun Li
- Department of Biochemistry, School of Medicine, Shenzhen Key Laboratory of Cell Microenvironment, Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Mingjue Chen
- Department of Biochemistry, School of Medicine, Shenzhen Key Laboratory of Cell Microenvironment, Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Mengrui Zhang
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, USA
| | - Sheng Chen
- Department of Orthopaedics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Minghao Qu
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
- Southern University of Science and Technology Hospital, Shenzhen, China
| | - Shuangshuang He
- Department of Biochemistry, School of Medicine, Shenzhen Key Laboratory of Cell Microenvironment, Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Lin Wang
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
- Southern University of Science and Technology Hospital, Shenzhen, China
| | - Xiaohao Wu
- Division of Immunology and Rheumatology, Stanford University, Stanford, CA, 94305, USA
- VA Palo Alto Health Care System, Palo Alto, CA, 94304, USA
| | - Guozhi Xiao
- Department of Biochemistry, School of Medicine, Shenzhen Key Laboratory of Cell Microenvironment, Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Southern University of Science and Technology, Shenzhen, 518055, China
| |
Collapse
|
5
|
Ortega-Batista A, Jaén-Alvarado Y, Moreno-Labrador D, Gómez N, García G, Guerrero EN. Single-Cell Sequencing: Genomic and Transcriptomic Approaches in Cancer Cell Biology. Int J Mol Sci 2025; 26:2074. [PMID: 40076700 PMCID: PMC11901077 DOI: 10.3390/ijms26052074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 02/18/2025] [Accepted: 02/24/2025] [Indexed: 03/14/2025] Open
Abstract
This article reviews the impact of single-cell sequencing (SCS) on cancer biology research. SCS has revolutionized our understanding of cancer and tumor heterogeneity, clonal evolution, and the complex interplay between cancer cells and tumor microenvironment. SCS provides high-resolution profiling of individual cells in genomic, transcriptomic, and epigenomic landscapes, facilitating the detection of rare mutations, the characterization of cellular diversity, and the integration of molecular data with phenotypic traits. The integration of SCS with multi-omics has provided a multidimensional view of cellular states and regulatory mechanisms in cancer, uncovering novel regulatory mechanisms and therapeutic targets. Advances in computational tools, artificial intelligence (AI), and machine learning have been crucial in interpreting the vast amounts of data generated, leading to the identification of new biomarkers and the development of predictive models for patient stratification. Furthermore, there have been emerging technologies such as spatial transcriptomics and in situ sequencing, which promise to further enhance our understanding of tumor microenvironment organization and cellular interactions. As SCS and its related technologies continue to advance, they are expected to drive significant advances in personalized cancer diagnostics, prognosis, and therapy, ultimately improving patient outcomes in the era of precision oncology.
Collapse
Affiliation(s)
- Ana Ortega-Batista
- Faculty of Science and Technology, Technological University of Panama, Ave Justo Arosemena, Entre Calle 35 y 36, Corregimiento de Calidonia, Panama City, Panama; (A.O.-B.)
| | - Yanelys Jaén-Alvarado
- Faculty of Science and Technology, Technological University of Panama, Ave Justo Arosemena, Entre Calle 35 y 36, Corregimiento de Calidonia, Panama City, Panama; (A.O.-B.)
- Gorgas Memorial Institute for Health Studies, Ave Justo Arosemena, Entre Calle 35 y 36, Corregimiento de Calidonia, Panama City, Panama
| | - Dilan Moreno-Labrador
- Faculty of Science and Technology, Technological University of Panama, Ave Justo Arosemena, Entre Calle 35 y 36, Corregimiento de Calidonia, Panama City, Panama; (A.O.-B.)
| | - Natasha Gómez
- Faculty of Science and Technology, Technological University of Panama, Ave Justo Arosemena, Entre Calle 35 y 36, Corregimiento de Calidonia, Panama City, Panama; (A.O.-B.)
| | - Gabriela García
- Faculty of Science and Technology, Technological University of Panama, Ave Justo Arosemena, Entre Calle 35 y 36, Corregimiento de Calidonia, Panama City, Panama; (A.O.-B.)
| | - Erika N. Guerrero
- Gorgas Memorial Institute for Health Studies, Ave Justo Arosemena, Entre Calle 35 y 36, Corregimiento de Calidonia, Panama City, Panama
- Sistema Nacional de Investigación, Secretaria Nacional de Ciencia y Tecnología, Edificio 205, Ciudad del Saber, Panama City, Panama
| |
Collapse
|
6
|
Oomen ME, Rodriguez-Terrones D, Kurome M, Zakhartchenko V, Mottes L, Simmet K, Noll C, Nakatani T, Mourra-Diaz CM, Aksoy I, Savatier P, Göke J, Wolf E, Kaessmann H, Torres-Padilla ME. An atlas of transcription initiation reveals regulatory principles of gene and transposable element expression in early mammalian development. Cell 2025; 188:1156-1174.e20. [PMID: 39837330 DOI: 10.1016/j.cell.2024.12.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 10/26/2024] [Accepted: 12/10/2024] [Indexed: 01/23/2025]
Abstract
Transcriptional activation of the embryonic genome (EGA) is a major developmental landmark enabling the embryo to become independent from maternal control. The magnitude and control of transcriptional reprogramming during this event across mammals remains poorly understood. Here, we developed Smart-seq+5' for high sensitivity, full-length transcript coverage and simultaneous capture of 5' transcript information from single cells and single embryos. Using Smart-seq+5', we profiled 34 developmental stages in 5 mammalian species and provide an extensive characterization of the transcriptional repertoire of early development before, during, and after EGA. We demonstrate widespread transposable element (TE)-driven transcription across species, including, remarkably, of DNA transposons. We identify 19,657 TE-driven genic transcripts, suggesting extensive TE co-option in early development over evolutionary timescales. TEs display similar expression dynamics across species and species-specific patterns, suggesting shared and divergent regulation. Our work provides a powerful resource for understanding transcriptional regulation of mammalian development.
Collapse
Affiliation(s)
- Marlies E Oomen
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, Munich, Germany
| | | | - Mayuko Kurome
- Genzentrum, Ludwig-Maximilians-Universität, Munich, Germany
| | | | - Lorenza Mottes
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, Munich, Germany
| | - Kilian Simmet
- Genzentrum, Ludwig-Maximilians-Universität, Munich, Germany
| | - Camille Noll
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, Munich, Germany
| | | | | | - Irene Aksoy
- Université Lyon 1, INSERM U1208, INRAE USC 1361, 69500 Bron, France
| | - Pierre Savatier
- Université Lyon 1, INSERM U1208, INRAE USC 1361, 69500 Bron, France; Platform PrimaStem, INSERM U1208, INRAE USC 1361, 69500 Bron, France
| | - Jonathan Göke
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore; Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
| | - Eckhard Wolf
- Genzentrum, Ludwig-Maximilians-Universität, Munich, Germany
| | - Henrik Kaessmann
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Maria-Elena Torres-Padilla
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, Munich, Germany; Faculty of Biology, Ludwig-Maximilians Universität, Munich, Germany.
| |
Collapse
|
7
|
Zhu W, Meng J, Li Y, Gu L, Liu W, Li Z, Shen Y, Shen X, Wang Z, Wu Y, Wang G, Zhang J, Zhang H, Yang H, Dong X, Wang H, Huang X, Sun Y, Li C, Mu L, Liu Z. Comparative proteomic landscapes elucidate human preimplantation development and failure. Cell 2025; 188:814-831.e21. [PMID: 39855199 DOI: 10.1016/j.cell.2024.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 11/21/2024] [Accepted: 12/19/2024] [Indexed: 01/27/2025]
Abstract
Understanding mammalian preimplantation development, particularly in humans, at the proteomic level remains limited. Here, we applied our comprehensive solution of ultrasensitive proteomic technology to measure the proteomic profiles of oocytes and early embryos and identified nearly 8,000 proteins in humans and over 6,300 proteins in mice. We observed distinct proteomic dynamics before and around zygotic genome activation (ZGA) between the two species. Integrative analysis with translatomic data revealed extensive divergence between translation activation and protein accumulation. Multi-omic analysis indicated that ZGA transcripts often contribute to protein accumulation in blastocysts. Using mouse embryos, we identified several transcriptional regulators critical for early development, thereby linking ZGA to the first lineage specification. Furthermore, single-embryo proteomics of poor-quality embryos from over 100 patient couples provided insights into preimplantation development failure. Our study may contribute to reshaping the framework of mammalian preimplantation development and opening avenues for addressing human infertility.
Collapse
Affiliation(s)
- Wencheng Zhu
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, CAS Key Laboratory of Primate Neurobiology, State Key Laboratory of Neuroscience, Chinese Academy of Sciences, Shanghai 200031, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai 200031, China.
| | - Juan Meng
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, CAS Key Laboratory of Primate Neurobiology, State Key Laboratory of Neuroscience, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan Li
- Reproductive Medicine Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China
| | - Lei Gu
- State Key Laboratory of Systems Medicine for Cancer, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Wenjun Liu
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, CAS Key Laboratory of Primate Neurobiology, State Key Laboratory of Neuroscience, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ziyi Li
- Shanghai Applied Protein Technology Co., Ltd., Shanghai 201100, China
| | - Yi Shen
- Shanghai Applied Protein Technology Co., Ltd., Shanghai 201100, China
| | - Xiaoyu Shen
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, CAS Key Laboratory of Primate Neurobiology, State Key Laboratory of Neuroscience, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zihong Wang
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, CAS Key Laboratory of Primate Neurobiology, State Key Laboratory of Neuroscience, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yonggen Wu
- Reproductive Medicine Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China
| | - Guiquan Wang
- Center for Reproductive Medicine, Women and Children's Hospital, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Junfeng Zhang
- Shanghai Applied Protein Technology Co., Ltd., Shanghai 201100, China
| | - Huiping Zhang
- Shanghai Applied Protein Technology Co., Ltd., Shanghai 201100, China
| | - Haiyan Yang
- Reproductive Medicine Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China
| | - Xi Dong
- Reproductive Medicine Center, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Hui Wang
- State Key Laboratory of Systems Medicine for Cancer, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Xuefeng Huang
- Reproductive Medicine Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China
| | - Yidi Sun
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, CAS Key Laboratory of Primate Neurobiology, State Key Laboratory of Neuroscience, Chinese Academy of Sciences, Shanghai 200031, China; State Key Laboratory of Genetic Evolution & Animal Models, Chinese Academy of Sciences, Shanghai, China.
| | - Chen Li
- State Key Laboratory of Systems Medicine for Cancer, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
| | - Liangshan Mu
- Reproductive Medicine Center, Zhongshan Hospital, Fudan University, Shanghai 200032, China.
| | - Zhen Liu
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, CAS Key Laboratory of Primate Neurobiology, State Key Laboratory of Neuroscience, Chinese Academy of Sciences, Shanghai 200031, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai 200031, China.
| |
Collapse
|
8
|
Du JH, Shen M, Mathys H, Roeder K. Causal differential expression analysis under unmeasured confounders with causarray. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.30.635593. [PMID: 39975097 PMCID: PMC11838442 DOI: 10.1101/2025.01.30.635593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Advances in single-cell sequencing and CRISPR technologies have enabled detailed case-control comparisons and experimental perturbations at single-cell resolution. However, uncovering causal relationships in observational genomic data remains challenging due to selection bias and inadequate adjustment for unmeasured confounders, particularly in heterogeneous datasets. To address these challenges, we introduce causarray, a doubly robust causal inference framework for analyzing array-based genomic data at both bulk-cell and single-cell levels. causarray integrates a generalized confounder adjustment method to account for unmeasured confounders and employs semiparametric inference with flexible machine learning techniques to ensure robust statistical estimation of treatment effects. Benchmarking results show that causarray robustly separates treatment effects from confounders while preserving biological signals across diverse settings. We also apply causarray to two single-cell genomic studies: (1) an in vivo Perturb-seq study of autism risk genes in developing mouse brains and (2) a case-control study of Alzheimer's disease using three human brain transcriptomic datasets. In these applications, causarray identifies clustered causal effects of multiple autism risk genes and consistent causally affected genes across Alzheimer's disease datasets, uncovering biologically relevant pathways directly linked to neuronal development and synaptic functions that are critical for understanding disease pathology.
Collapse
Affiliation(s)
- Jin-Hong Du
- Department of Statistics and Data Science, Carnegie Mellon University
- Machine Learning Department, Carnegie Mellon University
| | - Maya Shen
- Department of Statistics and Data Science, Carnegie Mellon University
| | | | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University
- Computational Biology Department, Carnegie Mellon University
| |
Collapse
|
9
|
Wang J, Ye F, Chai H, Jiang Y, Wang T, Ran X, Xia Q, Xu Z, Fu Y, Zhang G, Wu H, Guo G, Guo H, Ruan Y, Wang Y, Xing D, Xu X, Zhang Z. Advances and applications in single-cell and spatial genomics. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-024-2770-x. [PMID: 39792333 DOI: 10.1007/s11427-024-2770-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 10/10/2024] [Indexed: 01/12/2025]
Abstract
The applications of single-cell and spatial technologies in recent times have revolutionized the present understanding of cellular states and the cellular heterogeneity inherent in complex biological systems. These advancements offer unprecedented resolution in the examination of the functional genomics of individual cells and their spatial context within tissues. In this review, we have comprehensively discussed the historical development and recent progress in the field of single-cell and spatial genomics. We have reviewed the breakthroughs in single-cell multi-omics technologies, spatial genomics methods, and the computational strategies employed toward the analyses of single-cell atlas data. Furthermore, we have highlighted the advances made in constructing cellular atlases and their clinical applications, particularly in the context of disease. Finally, we have discussed the emerging trends, challenges, and opportunities in this rapidly evolving field.
Collapse
Affiliation(s)
- Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Haoxi Chai
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, 310058, China
| | - Yujia Jiang
- BGI Research, Shenzhen, 518083, China
- BGI Research, Hangzhou, 310030, China
| | - Teng Wang
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xia Ran
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China
| | - Qimin Xia
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China
| | - Ziye Xu
- Department of Laboratory Medicine of The First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Yuting Fu
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Guodong Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Hanyu Wu
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Zhejiang Provincial Key Lab for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, 310058, China.
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China.
| | - Hongshan Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China.
| | - Yijun Ruan
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, 310058, China.
| | - Yongcheng Wang
- Department of Laboratory Medicine of The First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
| | - Dong Xing
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, 100871, China.
| | - Xun Xu
- BGI Research, Shenzhen, 518083, China.
- BGI Research, Hangzhou, 310030, China.
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen, 518083, China.
| | - Zemin Zhang
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
10
|
Zhang Z, Mathew D, Lim TL, Mason K, Martinez CM, Huang S, Wherry EJ, Susztak K, Minn AJ, Ma Z, Zhang NR. Recovery of biological signals lost in single-cell batch integration with CellANOVA. Nat Biotechnol 2024:10.1038/s41587-024-02463-1. [PMID: 39592777 DOI: 10.1038/s41587-024-02463-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 10/02/2024] [Indexed: 11/28/2024]
Abstract
Data integration to align cells across batches has become a cornerstone of single-cell data analysis, critically affecting downstream results. Currently, there are no guidelines for when the biological differences between samples are separable from batch effects. Here we show that current paradigms for single-cell data integration remove biologically meaningful variation and introduce distortion. We present a statistical model and computationally scalable algorithm, CellANOVA (cell state space analysis of variance), that harnesses experimental design to explicitly recover biological signals that are erased during single-cell data integration. CellANOVA uses a 'pool-of-controls' design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest and allow the recovery of subtle biological signals. We apply CellANOVA to diverse contexts and validate the recovered biological signals by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nucleus data integration, where it recovers subtle biological signals that can be validated and replicated by external data.
Collapse
Affiliation(s)
- Zhaojun Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Divij Mathew
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Tristan L Lim
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kaishu Mason
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Clara Morral Martinez
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sijia Huang
- Penn Institute of Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - E John Wherry
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Katalin Susztak
- Renal, Electrolyte and Hypertension Division, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Andy J Minn
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Mark Foundation Center for Immunotherapy, Immune Signaling and Radiation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Zongming Ma
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| | - Nancy R Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
11
|
Astaburuaga-García R, Sell T, Mutlu S, Sieber A, Lauber K, Blüthgen N. RUCova: Removal of Unwanted Covariance in mass cytometry data. Bioinformatics 2024; 40:btae669. [PMID: 39579088 PMCID: PMC11601163 DOI: 10.1093/bioinformatics/btae669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 10/23/2024] [Accepted: 11/11/2024] [Indexed: 11/25/2024] Open
Abstract
MOTIVATION High dimensional single-cell mass cytometry data are confounded by unwanted covariance due to variations in cell size and staining efficiency, making analysis, and interpretation challenging. RESULTS We present RUCova, a novel method designed to address confounding factors in mass cytometry data. RUCova removes unwanted covariance from measured markers applying multivariate linear regression based on surrogates of sources of unwanted covariance (SUCs) and principal component analysis (PCA). We exemplify the use of RUCova and show that it effectively removes unwanted covariance while preserving genuine biological signals. Our results demonstrate the efficacy of RUCova in elucidating complex data patterns, facilitating the identification of activated signalling pathways, and improving the classification of important cell populations such as apoptotic cells. By providing a robust framework for data normalization and interpretation, RUCova enhances the accuracy and reliability of mass cytometry analyses, contributing to advances in our understanding of cellular biology and disease mechanisms. AVAILABILITY AND IMPLEMENTATION The R package is available on https://github.com/molsysbio/RUCova. Detailed documentation, data, and the code required to reproduce the results are available on https://doi.org/10.5281/zenodo.10913464.
Collapse
Affiliation(s)
- Rosario Astaburuaga-García
- Institute of Pathology, Charité-Universitätsmedizin Berlin, Berlin, 10117, Germany
- Institute of Biology, Humboldt Universität zu Berlin, Berlin, 10117, Germany
| | - Thomas Sell
- Institute of Pathology, Charité-Universitätsmedizin Berlin, Berlin, 10117, Germany
- Institute of Biology, Humboldt Universität zu Berlin, Berlin, 10117, Germany
| | - Samet Mutlu
- Department of Radiation Oncology, University Hospital, LMU München, Munich, 81377, Germany
- German Cancer Consortium (DKTK), Munich, 81377, Germany
- German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Anja Sieber
- Institute of Pathology, Charité-Universitätsmedizin Berlin, Berlin, 10117, Germany
- Institute of Biology, Humboldt Universität zu Berlin, Berlin, 10117, Germany
| | - Kirsten Lauber
- Department of Radiation Oncology, University Hospital, LMU München, Munich, 81377, Germany
- German Cancer Consortium (DKTK), Munich, 81377, Germany
- Clinical Cooperation Group ‘Personalized Radiotherapy in Head and Neck Cancer’ Helmholtz Center Munich, German Research Center for Environmental Health GmbH, Neuherberg, 85764, Germany
| | - Nils Blüthgen
- Institute of Pathology, Charité-Universitätsmedizin Berlin, Berlin, 10117, Germany
- Institute of Biology, Humboldt Universität zu Berlin, Berlin, 10117, Germany
- German Cancer Consortium (DKTK), Berlin, 10117, Germany
| |
Collapse
|
12
|
Chen L, Guo Z, Deng T, Wu H. scCTS: identifying the cell type-specific marker genes from population-level single-cell RNA-seq. Genome Biol 2024; 25:269. [PMID: 39402623 PMCID: PMC11472465 DOI: 10.1186/s13059-024-03410-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 09/30/2024] [Indexed: 10/19/2024] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) provides gene expression profiles of individual cells from complex samples, facilitating the detection of cell type-specific marker genes. In scRNA-seq experiments with multiple donors, the population level variation brings an extra layer of complexity in cell type-specific gene detection, for example, they may not appear in all donors. Motivated by this observation, we develop a statistical model named scCTS to identify cell type-specific genes from population-level scRNA-seq data. Extensive data analyses demonstrate that the proposed method identifies more biologically meaningful cell type-specific genes compared to traditional methods.
Collapse
Affiliation(s)
- Luxiao Chen
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| | - Zhenxing Guo
- School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-SZ), Shenzhen, 518172, Guangdong, China
| | - Tao Deng
- School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-SZ), Shenzhen, 518172, Guangdong, China
- Shenzhen Research Institute of Big Data, Shenzhen, 518172, China
| | - Hao Wu
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518055, Guangdong, China.
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
13
|
Lim SY, Lin Y, Lee JH, Pedersen B, Stewart A, Scolyer RA, Long GV, Yang JYH, Rizos H. Single-cell RNA sequencing reveals melanoma cell state-dependent heterogeneity of response to MAPK inhibitors. EBioMedicine 2024; 107:105308. [PMID: 39216232 PMCID: PMC11402938 DOI: 10.1016/j.ebiom.2024.105308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 08/11/2024] [Accepted: 08/12/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Melanoma is a heterogeneous cancer influenced by the plasticity of melanoma cells and their dynamic adaptations to microenvironmental cues. Melanoma cells transition between well-defined transcriptional cell states that impact treatment response and resistance. METHODS In this study, we applied single-cell RNA sequencing to interrogate the molecular features of immunotherapy-naive and immunotherapy-resistant melanoma tumours in response to ex vivo BRAF/MEK inhibitor treatment. FINDINGS We confirm the presence of four distinct melanoma cell states - melanocytic, transitory, neural-crest like and undifferentiated, and identify enrichment of neural crest-like and undifferentiated melanoma cells in immunotherapy-resistant tumours. Furthermore, we introduce an integrated computational approach to identify subsets of responding and nonresponding melanoma cells within the transcriptional cell states. INTERPRETATION Nonresponding melanoma cells are identified in all transcriptional cell states and are predisposed to BRAF/MEK inhibitor resistance due to pro-inflammatory IL6 and TNFɑ signalling. Our study provides a framework to study treatment response within distinct melanoma cell states and indicate that tumour-intrinsic pro-inflammatory signalling contributes to BRAF/MEK inhibitor resistance. FUNDING This work was supported by Macquarie University, Melanoma Institute Australia, and the National Health and Medical Research Council of Australia (NHMRC; grant 2012860, 2028055).
Collapse
Affiliation(s)
- Su Yin Lim
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia.
| | - Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, Australia; Charles Perkins Centre, The University of Sydney, Australia
| | - Jenny H Lee
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia; Department of Neurosurgery, Chris O'Brien Lifehouse, Sydney, NSW, Australia
| | - Bernadette Pedersen
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia
| | - Ashleigh Stewart
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia
| | - Richard A Scolyer
- Melanoma Institute Australia, Australia; Charles Perkins Centre, The University of Sydney, Australia; Tissue Pathology and Diagnostic Oncology, Royal Prince Alfred Hospital and NSW Health Pathology, Sydney, Australia; Faculty of Medicine and Health, The University of Sydney, Australia
| | - Georgina V Long
- Melanoma Institute Australia, Australia; Charles Perkins Centre, The University of Sydney, Australia; Royal North Shore and Mater Hospitals, Sydney, Australia; Faculty of Medicine and Health, The University of Sydney, Australia
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, Australia; Charles Perkins Centre, The University of Sydney, Australia
| | - Helen Rizos
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Australia; Melanoma Institute Australia, Australia
| |
Collapse
|
14
|
Jeong Y, Ronen J, Kopp W, Lutsik P, Akalin A. scMaui: a widely applicable deep learning framework for single-cell multiomics integration in the presence of batch effects and missing data. BMC Bioinformatics 2024; 25:257. [PMID: 39107690 PMCID: PMC11304929 DOI: 10.1186/s12859-024-05880-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 07/23/2024] [Indexed: 08/10/2024] Open
Abstract
The recent advances in high-throughput single-cell sequencing have created an urgent demand for computational models which can address the high complexity of single-cell multiomics data. Meticulous single-cell multiomics integration models are required to avoid biases towards a specific modality and overcome sparsity. Batch effects obfuscating biological signals must also be taken into account. Here, we introduce a new single-cell multiomics integration model, Single-cell Multiomics Autoencoder Integration (scMaui) based on variational product-of-experts autoencoders and adversarial learning. scMaui calculates a joint representation of multiple marginal distributions based on a product-of-experts approach which is especially effective for missing values in the modalities. Furthermore, it overcomes limitations seen in previous VAE-based integration methods with regard to batch effect correction and restricted applicable assays. It handles multiple batch effects independently accepting both discrete and continuous values, as well as provides varied reconstruction loss functions to cover all possible assays and preprocessing pipelines. We demonstrate that scMaui achieves superior performance in many tasks compared to other methods. Further downstream analyses also demonstrate its potential in identifying relations between assays and discovering hidden subpopulations.
Collapse
Affiliation(s)
- Yunhee Jeong
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, Germany
- Faculty of Mathematics and Informatics, Heidelberg University, Im Neuenheimer Feld 205, Heidelberg, Germany
| | - Jonathan Ronen
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany
- Inceptive Nucleics, Inc., Palo Alto, CA, USA
| | - Wolfgang Kopp
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany
- Roche Diagnostics GmbH, Penzberg, Germany
| | - Pavlo Lutsik
- Division of Cancer Epigenomics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, Germany.
- Department of Oncology, Catholic University (KU) Leuven, Leuven, Belgium.
| | - Altuna Akalin
- Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany.
| |
Collapse
|
15
|
Li Y, Lin Y, Hu P, Peng D, Luo H, Peng X. Single-Cell RNA-Seq Debiased Clustering via Batch Effect Disentanglement. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11371-11381. [PMID: 37030864 DOI: 10.1109/tnnls.2023.3260003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A variety of single-cell RNA-seq (scRNA-seq) clustering methods has achieved great success in discovering cellular phenotypes. However, it remains challenging when the data confounds with batch effects brought by different experimental conditions or technologies. Namely, the data partitions would be biased toward these nonbiological factors. Meanwhile, the batch differences are not always much smaller than true biological variations, hindering the cooperation of batch integration and clustering methods. To overcome this challenge, we propose single-cell RNA-seq debiased clustering (SCDC), an end-to-end clustering method that is debiased toward batch effects by disentangling the biological and nonbiological information from scRNA-seq data during data partitioning. In six analyses, SCDC qualitatively and quantitatively outperforms both the state-of-the-art clustering and batch integration methods in handling scRNA-seq data with batch effects. Furthermore, SCDC clusters data with a linearly increasing running time with respect to cell numbers and a fixed graphics processing unit (GPU) memory consumption, making it scalable to large datasets. The code will be released on Github.
Collapse
|
16
|
Hu H, Wang X, Feng S, Xu Z, Liu J, Heidrich-O'Hare E, Chen Y, Yue M, Zeng L, Rong Z, Chen T, Billiar T, Ding Y, Huang H, Duerr RH, Chen W. A unified model-based framework for doublet or multiplet detection in single-cell multiomics data. Nat Commun 2024; 15:5562. [PMID: 38956023 PMCID: PMC11220103 DOI: 10.1038/s41467-024-49448-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 06/03/2024] [Indexed: 07/04/2024] Open
Abstract
Droplet-based single-cell sequencing techniques rely on the fundamental assumption that each droplet encapsulates a single cell, enabling individual cell omics profiling. However, the inevitable issue of multiplets, where two or more cells are encapsulated within a single droplet, can lead to spurious cell type annotations and obscure true biological findings. The issue of multiplets is exacerbated in single-cell multiomics settings, where integrating cross-modality information for clustering can inadvertently promote the aggregation of multiplet clusters and increase the risk of erroneous cell type annotations. Here, we propose a compound Poisson model-based framework for multiplet detection in single-cell multiomics data. Leveraging experimental cell hashing results as the ground truth for multiplet status, we conducted trimodal DOGMA-seq experiments and generated 17 benchmarking datasets from two tissues, involving a total of 280,123 droplets. We demonstrated that the proposed method is an essential tool for integrating cross-modality multiplet signals, effectively eliminating multiplet clusters in single-cell multiomics data-a task at which the benchmarked single-omics methods proved inadequate.
Collapse
Affiliation(s)
- Haoran Hu
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Xinjun Wang
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Site Feng
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA
- School of Medicine, Tsinghua University, 100084, Beijing, China
| | - Zhongli Xu
- School of Medicine, Tsinghua University, 100084, Beijing, China
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, 15224, USA
| | - Jing Liu
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, 15224, USA
| | | | - Yanshuo Chen
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
- Center of Bioinformatics and Computational Biology, College Park, MD, 20740, USA
| | - Molin Yue
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Lang Zeng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Ziqi Rong
- School of Information, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Tianmeng Chen
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Timothy Billiar
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Ying Ding
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Heng Huang
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
- Center of Bioinformatics and Computational Biology, College Park, MD, 20740, USA
| | - Richard H Duerr
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| | - Wei Chen
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, 15224, USA.
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| |
Collapse
|
17
|
Papiez A, Pioch J, Mollenkopf HJ, Corleis B, Dorhoi A, Polanska J. Relative effect size-based profiles as an alternative to differentiation analysis in multi-species single-cell transcriptional studies. PLoS One 2024; 19:e0305874. [PMID: 38917129 PMCID: PMC11198858 DOI: 10.1371/journal.pone.0305874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 06/04/2024] [Indexed: 06/27/2024] Open
Abstract
Combining data from experiments on multispecies studies provides invaluable contributions to the understanding of basic disease mechanisms and pathophysiology of pathogens crossing species boundaries. The task of multispecies gene expression analysis, however, is often challenging given annotation inconsistencies and in cases of small sample sizes due to bias caused by batch effects. In this work we aim to demonstrate that an alternative approach to standard differential expression analysis in single cell RNA-sequencing (scRNA-seq) based on effect size profiles is suitable for the fusion of data from small samples and multiple organisms. The analysis pipeline is based on effect size metric profiles of samples in specific cell clusters. The effect size substitutes standard differentiation analyses based on p-values and profiles identified based on these effect size metrics serve as a tool to link cell type clusters between the studied organisms. The algorithms were tested on published scRNA-seq data sets derived from several species and subsequently validated on own data from human and bovine peripheral blood mononuclear cells stimulated with Mycobacterium tuberculosis. Correlation of the effect size profiles between clusters allowed for the linkage of human and bovine cell types. Moreover, effect size ratios were used to identify differentially regulated genes in control and stimulated samples. The genes identified through effect size profiling were confirmed experimentally using qPCR. We demonstrate that in situations where batch effects dominate cell type variation in single cell small sample size multispecies studies, effect size profiling is a valid alternative to traditional statistical inference techniques.
Collapse
Affiliation(s)
- Anna Papiez
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Jonathan Pioch
- Institute of Immunology, Friedrich Loeffler Institute, Greifswald, Germany
| | | | - Björn Corleis
- Institute of Immunology, Friedrich Loeffler Institute, Greifswald, Germany
| | - Anca Dorhoi
- Institute of Immunology, Friedrich Loeffler Institute, Greifswald, Germany
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
18
|
McLean AK, Reynolds G, Pratt AG. Leveraging Multi-Tissue, Single-Cell Atlases as Tools to Elucidate Shared Mechanisms of Immune-Mediated Inflammatory Diseases. Biomedicines 2024; 12:1297. [PMID: 38927506 PMCID: PMC11201400 DOI: 10.3390/biomedicines12061297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/05/2024] [Accepted: 06/08/2024] [Indexed: 06/28/2024] Open
Abstract
The observation that certain therapeutic strategies for targeting inflammation benefit patients with distinct immune-mediated inflammatory diseases (IMIDs) is exemplified by the success of TNF blockade in conditions including rheumatoid arthritis, ulcerative colitis, and skin psoriasis, albeit only for subsets of individuals with each condition. This suggests intersecting "nodes" in inflammatory networks at a molecular and cellular level may drive and/or maintain IMIDs, being "shared" between traditionally distinct diagnoses without mapping neatly to a single clinical phenotype. In line with this proposition, integrative tumour tissue analyses in oncology have highlighted novel cell states acting across diverse cancers, with important implications for precision medicine. Drawing upon advances in the oncology field, this narrative review will first summarise learnings from the Human Cell Atlas in health as a platform for interrogating IMID tissues. It will then review cross-disease studies to date that inform this endeavour before considering future directions in the field.
Collapse
Affiliation(s)
- Anthony K. McLean
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Gary Reynolds
- Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Arthur G. Pratt
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Musculoskeletal Unit, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne NE7 7DN, UK
| |
Collapse
|
19
|
Ma F, Zheng C. Single-cell phylotranscriptomics of developmental and cell type evolution. Trends Genet 2024; 40:495-510. [PMID: 38490933 DOI: 10.1016/j.tig.2024.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/16/2024] [Accepted: 02/16/2024] [Indexed: 03/17/2024]
Abstract
Single-cell phylotranscriptomics is an emerging tool to reveal the molecular and cellular mechanisms of evolution. We summarize its utility in studying the hourglass pattern of ontogenetic evolution and for understanding the evolutionary history of cell types. The developmental hourglass model suggests that the mid-embryonic stage is the most conserved period of development across species, which is supported by morphological and molecular studies. Single-cell phylotranscriptomic analysis has revealed previously underappreciated heterogeneity in transcriptome ages among lineages and cell types throughout development, and has identified the lineages and tissues that drive the whole-organism hourglass pattern. Single-cell transcriptome age analyses also provide important insights into the origin of germ layers, the different selective forces on tissues during adaptation, and the evolutionary relationships between cell types.
Collapse
Affiliation(s)
- Fuqiang Ma
- School of Biological Sciences, The University of Hong Kong, Hong Kong SAR, China
| | - Chaogu Zheng
- School of Biological Sciences, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
20
|
Chen C, Lee S, Zyner KG, Fernando M, Nemeruck V, Wong E, Marshall LL, Wark JR, Aryamanesh N, Tam PPL, Graham ME, Gonzalez-Cordero A, Yang P. Trans-omic profiling uncovers molecular controls of early human cerebral organoid formation. Cell Rep 2024; 43:114219. [PMID: 38748874 DOI: 10.1016/j.celrep.2024.114219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/01/2024] [Accepted: 04/25/2024] [Indexed: 06/01/2024] Open
Abstract
Defining the molecular networks orchestrating human brain formation is crucial for understanding neurodevelopment and neurological disorders. Challenges in acquiring early brain tissue have incentivized the use of three-dimensional human pluripotent stem cell (hPSC)-derived neural organoids to recapitulate neurodevelopment. To elucidate the molecular programs that drive this highly dynamic process, here, we generate a comprehensive trans-omic map of the phosphoproteome, proteome, and transcriptome of the exit of pluripotency and neural differentiation toward human cerebral organoids (hCOs). These data reveal key phospho-signaling events and their convergence on transcriptional factors to regulate hCO formation. Comparative analysis with developing human and mouse embryos demonstrates the fidelity of our hCOs in modeling embryonic brain development. Finally, we demonstrate that biochemical modulation of AKT signaling can control hCO differentiation. Together, our data provide a comprehensive resource to study molecular controls in human embryonic brain development and provide a guide for the future development of hCO differentiation protocols.
Collapse
Affiliation(s)
- Carissa Chen
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia
| | - Scott Lee
- Stem Cell and Organoid Facility, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Katherine G Zyner
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia
| | - Milan Fernando
- Stem Cell and Organoid Facility, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Victoria Nemeruck
- Stem Cell Medicine Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Emilie Wong
- Stem Cell Medicine Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia
| | - Lee L Marshall
- Bioinformatics Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Jesse R Wark
- Synapse Proteomics, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Nader Aryamanesh
- Bioinformatics Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia
| | - Patrick P L Tam
- Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia
| | - Mark E Graham
- Synapse Proteomics, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia.
| | - Anai Gonzalez-Cordero
- Stem Cell and Organoid Facility, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; Stem Cell Medicine Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia.
| | - Pengyi Yang
- Computational Systems Biology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2006, Australia; Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
21
|
Park Y, Hauschild AC. The effect of data transformation on low-dimensional integration of single-cell RNA-seq. BMC Bioinformatics 2024; 25:171. [PMID: 38689234 PMCID: PMC11059821 DOI: 10.1186/s12859-024-05788-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen, Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany.
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany.
| |
Collapse
|
22
|
Fan Y, Li L, Sun S. Powerful and accurate detection of temporal gene expression patterns from multi-sample multi-stage single-cell transcriptomics data with TDEseq. Genome Biol 2024; 25:96. [PMID: 38622747 PMCID: PMC11020788 DOI: 10.1186/s13059-024-03237-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 04/03/2024] [Indexed: 04/17/2024] Open
Abstract
We present a non-parametric statistical method called TDEseq that takes full advantage of smoothing splines basis functions to account for the dependence of multiple time points in scRNA-seq studies, and uses hierarchical structure linear additive mixed models to model the correlated cells within an individual. As a result, TDEseq demonstrates powerful performance in identifying four potential temporal expression patterns within a specific cell type. Extensive simulation studies and the analysis of four published scRNA-seq datasets show that TDEseq can produce well-calibrated p-values and up to 20% power gain over the existing methods for detecting temporal gene expression patterns.
Collapse
Affiliation(s)
- Yue Fan
- Center for Single-Cell Omics and Health, School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061, People's Republic of China
- Collaborative Innovation Center of Endemic Diseases and Health Promotion in Silk Road Region; NHC Key Laboratory of Environment and Endemic Diseases, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061, People's Republic of China
- Key Laboratory of Environment and Genes Related to Diseases (Xi'an Jiaotong University), Ministry of Education, Xi'an, Shaanxi, 710061, People's Republic of China
| | - Lei Li
- Center for Single-Cell Omics and Health, School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061, People's Republic of China
- Collaborative Innovation Center of Endemic Diseases and Health Promotion in Silk Road Region; NHC Key Laboratory of Environment and Endemic Diseases, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061, People's Republic of China
| | - Shiquan Sun
- Center for Single-Cell Omics and Health, School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061, People's Republic of China.
- Collaborative Innovation Center of Endemic Diseases and Health Promotion in Silk Road Region; NHC Key Laboratory of Environment and Endemic Diseases, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061, People's Republic of China.
- Key Laboratory of Environment and Genes Related to Diseases (Xi'an Jiaotong University), Ministry of Education, Xi'an, Shaanxi, 710061, People's Republic of China.
- Key Laboratory for Disease Prevention and Control and Health Promotion of Shaanxi Province, Xi'an, Shaanxi, 710061, People's Republic of China.
| |
Collapse
|
23
|
Koca MB, Sevilgen FE. Integration of single-cell proteomic datasets through distinctive proteins in cell clusters. Proteomics 2024; 24:e2300282. [PMID: 38135888 DOI: 10.1002/pmic.202300282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/01/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023]
Abstract
The use of mass spectrometry and antibody-based sequencing technologies at the single-cell level has led to an increase in single-cell proteomic datasets. Integrating these datasets is crucial to eliminate the batch effect that often arises due to their limited sequencing molecules. Although methods for horizontally integrating high-dimensional single-cell transcriptomic datasets can also be applied to single-cell proteomic datasets, a specialized approach explicitly tailored for low-dimensional proteomic datasets may enhance the integration process. Here, we introduce SCPRO-HI, an algorithm for the horizontal integration of antibody-based single-cell proteomic datasets. It utilizes a hierarchical cell anchoring technique to match cells based on the similarity of distinctive proteins for constituting cell clusters. A novel variational auto-encoder model is employed for correcting batch effects on the protein abundances, eliminating the need for mapping them into a new domain. Moreover, we propose a technique for extending the algorithm to high-dimensional datasets. The performance of the SCPRO-HI algorithm is evaluated using simulated and real-world single-cell proteomic datasets. The findings demonstrate our algorithm outperforms state-of-the-art methods, achieving a 75% higher silhouette score while preserving HVPs 13% better. Furthermore, the algorithm shows competitive performance in transcriptomic datasets, suggesting potential for integrating high-dimensional mass-spectrometry-based proteomic datasets.
Collapse
Affiliation(s)
- Mehmet Burak Koca
- Computer Engineering Department, Gebze Technical University, Kocaeli, Türkiye
| | - Fatih Erdoğan Sevilgen
- Institute for Data Science and Artificial Intelligence, Boğaziçi University, İstanbul, Türkiye
| |
Collapse
|
24
|
Ghazanfar S, Guibentif C, Marioni JC. Stabilized mosaic single-cell data integration using unshared features. Nat Biotechnol 2024; 42:284-292. [PMID: 37231260 PMCID: PMC10869270 DOI: 10.1038/s41587-023-01766-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 03/28/2023] [Indexed: 05/27/2023]
Abstract
Currently available single-cell omics technologies capture many unique features with different biological information content. Data integration aims to place cells, captured with different technologies, onto a common embedding to facilitate downstream analytical tasks. Current horizontal data integration techniques use a set of common features, thereby ignoring non-overlapping features and losing information. Here we introduce StabMap, a mosaic data integration technique that stabilizes mapping of single-cell data by exploiting the non-overlapping features. StabMap first infers a mosaic data topology based on shared features, then projects all cells onto supervised or unsupervised reference coordinates by traversing shortest paths along the topology. We show that StabMap performs well in various simulation contexts, facilitates 'multi-hop' mosaic data integration where some datasets do not share any features and enables the use of spatial gene expression features for mapping dissociated single-cell data onto a spatial transcriptomic reference.
Collapse
Affiliation(s)
- Shila Ghazanfar
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- School of Mathematics and Statistics, The University of Sydney, Camperdown, New South Wales, Australia.
- Charles Perkins Centre, The University of Sydney, Camperdown, New South Wales, Australia.
| | - Carolina Guibentif
- Sahlgrenska Center for Cancer Research, Inst. Biomedicine, Dept. Microbiology and Immunology, University of Gothenburg, Gothenburg, Sweden
| | - John C Marioni
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
25
|
Tian J, Lei J, Roeder K. From local to global gene co-expression estimation using single-cell RNA-seq data. Biometrics 2024; 80:ujae001. [PMID: 38465983 PMCID: PMC10926266 DOI: 10.1093/biomtc/ujae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 10/01/2023] [Accepted: 01/15/2024] [Indexed: 03/12/2024]
Abstract
In genomics studies, the investigation of gene relationships often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be nonlinear or even nonmonotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure, the averaged Local Density Gap (aLDG), that accumulates local dependence and can detect any nonlinear, nonmonotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Then, we show that averaging the cell-specific gene relationship over mini-batches determined by some external structure information (eg, spatial or temporal factor) better highlights meaningful local structure change points. We explore the application of aLDG and its minibatch variant in many scenarios, including pairwise gene relationship estimation, bifurcating point detection in cell trajectory, and spatial transcriptomics structure visualization. Both simulations and real data analysis show that aLDG outperforms existing ones.
Collapse
Affiliation(s)
- Jinjin Tian
- Department of Statistics and Data Science, Carnegie Mellon University, 15213, Pittsburgh, PA, United States
| | - Jing Lei
- Department of Statistics and Data Science, Carnegie Mellon University, 15213, Pittsburgh, PA, United States
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, 15213, Pittsburgh, PA, United States
| |
Collapse
|
26
|
Huan C, Li J, Li Y, Zhao S, Yang Q, Zhang Z, Li C, Li S, Guo Z, Yao J, Zhang W, Zhou L. Spatially Resolved Multiomics: Data Analysis from Monoomics to Multiomics. BME FRONTIERS 2024; 6:0084. [PMID: 39810754 PMCID: PMC11725630 DOI: 10.34133/bmef.0084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 11/05/2024] [Accepted: 12/02/2024] [Indexed: 01/16/2025] Open
Abstract
Spatial monoomics has been recognized as a powerful tool for exploring life sciences. Recently, spatial multiomics has advanced considerably, which could contribute to clarifying many biological issues. Spatial monoomics techniques in epigenomics, genomics, transcriptomics, proteomics, and metabolomics can enhance our understanding of biological functions and cellular identities by simultaneously measuring tissue structures and biomolecule levels. Spatial monoomics technology has evolved from monoomics to spatial multiomics. Moreover, the spatial resolution, high-throughput detection capability, capture efficiency, and compatibility with various sample types of omics technology have considerably advanced. Despite the technological advances in this field, data analysis frameworks have stagnated. Current challenges include incomplete spatial monoomics data analysis pipeline, overly complex data analysis tasks, and few established spatial multiomics data analysis strategies. In this review, we systematically summarize recent developments of various spatial monoomics techniques and improvements in related data analysis pipeline. On the basis of the spatial multiomics technology, we propose a data integration strategy with cross-platform, cross-slice, and cross-modality. We summarize the potential applications of spatial monoomics technology, aiming to provide researchers and clinicians with a better understanding of how such applications have advanced. Spatial multiomics technology is expected to substantially impact biology and precision medicine through measurements of cellular tissue structures and the extraction of biomolecular features.
Collapse
Affiliation(s)
- Changxiang Huan
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine,
University of Science and Technology of China, Hefei 230026, China
| | - Jinze Li
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
| | - Yingxue Li
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
| | - Shasha Zhao
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
| | - Qi Yang
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
| | - Zhiqi Zhang
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
| | - Chuanyu Li
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine,
University of Science and Technology of China, Hefei 230026, China
| | - Shuli Li
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
| | - Zhen Guo
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine,
University of Science and Technology of China, Hefei 230026, China
| | - Jia Yao
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine,
University of Science and Technology of China, Hefei 230026, China
| | - Wei Zhang
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine,
University of Science and Technology of China, Hefei 230026, China
| | - Lianqun Zhou
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology,
Chinese Academy of Sciences, Suzhou 215163, China
| |
Collapse
|
27
|
Hachey SJ, Hatch CJ, Gaebler D, Mocherla A, Nee K, Kessenbrock K, Hughes CCW. Targeting tumor-stromal interactions in triple-negative breast cancer using a human vascularized micro-tumor model. Breast Cancer Res 2024; 26:5. [PMID: 38183074 PMCID: PMC10768273 DOI: 10.1186/s13058-023-01760-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 12/21/2023] [Indexed: 01/07/2024] Open
Abstract
Triple-negative breast cancer (TNBC) is highly aggressive with limited available treatments. Stromal cells in the tumor microenvironment (TME) are crucial in TNBC progression; however, understanding the molecular basis of stromal cell activation and tumor-stromal crosstalk in TNBC is limited. To investigate therapeutic targets in the TNBC stromal niche, we used an advanced human in vitro microphysiological system called the vascularized micro-tumor (VMT). Using single-cell RNA sequencing, we revealed that normal breast tissue stromal cells activate neoplastic signaling pathways in the TNBC TME. By comparing interactions in VMTs with clinical data, we identified therapeutic targets at the tumor-stromal interface with potential clinical significance. Combining treatments targeting Tie2 signaling with paclitaxel resulted in vessel normalization and increased efficacy of paclitaxel in the TNBC VMT. Dual inhibition of HER3 and Akt also showed efficacy against TNBC. These data demonstrate the potential of inducing a favorable TME as a targeted therapeutic approach in TNBC.
Collapse
Affiliation(s)
- Stephanie J Hachey
- Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, USA.
| | | | - Daniela Gaebler
- Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, USA
| | - Aneela Mocherla
- Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, USA
| | - Kevin Nee
- Biological Chemistry, University of California, Irvine, Irvine, CA, USA
| | - Kai Kessenbrock
- Biological Chemistry, University of California, Irvine, Irvine, CA, USA
| | - Christopher C W Hughes
- Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA, USA
- Biomedical Engineering, University of California, Irvine, Irvine, CA, USA
| |
Collapse
|
28
|
Cao Y, Tran A, Kim H, Robertson N, Lin Y, Torkel M, Yang P, Patrick E, Ghazanfar S, Yang J. Thinking process templates for constructing data stories with SCDNEY. F1000Res 2023; 12:261. [PMID: 38434622 PMCID: PMC10905113 DOI: 10.12688/f1000research.130623.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/08/2023] [Indexed: 03/05/2024] Open
Abstract
Background Globally, scientists now have the ability to generate a vast amount of high throughput biomedical data that carry critical information for important clinical and public health applications. This data revolution in biology is now creating a plethora of new single-cell datasets. Concurrently, there have been significant methodological advances in single-cell research. Integrating these two resources, creating tailor-made, efficient, and purpose-specific data analysis approaches can assist in accelerating scientific discovery. Methods We developed a series of living workshops for building data stories, using Single-cell data integrative analysis (scdney). scdney is a wrapper package with a collection of single-cell analysis R packages incorporating data integration, cell type annotation, higher order testing and more. Results Here, we illustrate two specific workshops. The first workshop examines how to characterise the identity and/or state of cells and the relationship between them, known as phenotyping. The second workshop focuses on extracting higher-order features from cells to predict disease progression. Conclusions Through these workshops, we not only showcase current solutions, but also highlight critical thinking points. In particular, we highlight the Thinking Process Template that provides a structured framework for the decision-making process behind such single-cell analyses. Furthermore, our workshop will incorporate dynamic contributions from the community in a collaborative learning approach, thus the term 'living'.
Collapse
Affiliation(s)
- Yue Cao
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Andy Tran
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Hani Kim
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Children's Medical Research Institute, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Nick Robertson
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Yingxin Lin
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Marni Torkel
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Pengyi Yang
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Children's Medical Research Institute, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Ellis Patrick
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Shila Ghazanfar
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Jean Yang
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| |
Collapse
|
29
|
Liang S, Li Y, Chen Y, Huang H, Zhou R, Ma T. Application and prospects of single-cell and spatial omics technologies in woody plants. FORESTRY RESEARCH 2023; 3:27. [PMID: 39526269 PMCID: PMC11524316 DOI: 10.48130/fr-2023-0027] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 11/02/2023] [Indexed: 11/16/2024]
Abstract
Over the past decade, high-throughput sequencing and high-resolution single-cell transcriptome sequencing technologies have undergone rapid development, leading to significant breakthroughs. Traditional molecular biology methods are limited in their ability to unravel cellular-level heterogeneity within woody plant tissues. Consequently, techniques such as single-cell transcriptomics, single-cell epigenetics, and spatial transcriptomics are rapidly gaining popularity in the study of woody plants. In this review, we provide a comprehensive overview of the development of these technologies, with a focus on their applications and the challenges they present in single-cell transcriptome research in woody plants. In particular, we delve into the similarities and differences among the results of current studies and analyze the reasons behind these differences. Furthermore, we put forth potential solutions to overcome the challenges encountered in single-cell transcriptome applications in woody plants. Finally, we discuss the application directions of these techniques to address key challenges in woody plant research in the future.
Collapse
Affiliation(s)
- Shaoming Liang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, Sichuan Zoige Alpine Wetland Ecosystem National Observation and Research Station, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yiling Li
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, Sichuan Zoige Alpine Wetland Ecosystem National Observation and Research Station, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yang Chen
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, Sichuan Zoige Alpine Wetland Ecosystem National Observation and Research Station, College of Life Sciences, Sichuan University, Chengdu, China
| | - Heng Huang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, Sichuan Zoige Alpine Wetland Ecosystem National Observation and Research Station, College of Life Sciences, Sichuan University, Chengdu, China
| | - Ran Zhou
- School of Forestry and Natural Resources, University of Georgia, Athens, GA, USA
| | - Tao Ma
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, Sichuan Zoige Alpine Wetland Ecosystem National Observation and Research Station, College of Life Sciences, Sichuan University, Chengdu, China
| |
Collapse
|
30
|
Paas-Oliveros E, Hernández-Lemus E, de Anda-Jáuregui G. Computational single cell oncology: state of the art. Front Genet 2023; 14:1256991. [PMID: 38028624 PMCID: PMC10663273 DOI: 10.3389/fgene.2023.1256991] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Single cell computational analysis has emerged as a powerful tool in the field of oncology, enabling researchers to decipher the complex cellular heterogeneity that characterizes cancer. By leveraging computational algorithms and bioinformatics approaches, this methodology provides insights into the underlying genetic, epigenetic and transcriptomic variations among individual cancer cells. In this paper, we present a comprehensive overview of single cell computational analysis in oncology, discussing the key computational techniques employed for data processing, analysis, and interpretation. We explore the challenges associated with single cell data, including data quality control, normalization, dimensionality reduction, clustering, and trajectory inference. Furthermore, we highlight the applications of single cell computational analysis, including the identification of novel cell states, the characterization of tumor subtypes, the discovery of biomarkers, and the prediction of therapy response. Finally, we address the future directions and potential advancements in the field, including the development of machine learning and deep learning approaches for single cell analysis. Overall, this paper aims to provide a roadmap for researchers interested in leveraging computational methods to unlock the full potential of single cell analysis in understanding cancer biology with the goal of advancing precision oncology. For this purpose, we also include a notebook that instructs on how to apply the recommended tools in the Preprocessing and Quality Control section.
Collapse
Affiliation(s)
- Ernesto Paas-Oliveros
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Investigadores por Mexico, Conahcyt, Mexico City, Mexico
| |
Collapse
|
31
|
Ruan X, Huang Y, Geng L, Tian M, Liu Y, Tao M, Zheng X, Li P, Zhao M. Consistent analysis of differentially expressed genes across 7 cell types in papillary thyroid carcinoma. Comput Struct Biotechnol J 2023; 21:5337-5349. [PMID: 37954148 PMCID: PMC10637855 DOI: 10.1016/j.csbj.2023.10.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 10/22/2023] [Accepted: 10/23/2023] [Indexed: 11/14/2023] Open
Abstract
Single-cell transcriptome sequencing (scRNA-seq) provides a higher resolution of cellular differences than bulk RNA-seq, enabling the dissection of cell-type-specific responses to perturbations in papillary thyroid carcinoma (PTC). However, cellular genomic features are highly heterogeneous and have a large number of genes without any expression signals, which hinders the statistical power to identify differentially expressed genes and may generate many false-positive results. To overcome this challenge, we conducted an integrative analysis on two PTC scRNA-seq datasets and cross-validated consistent differential expression. By combining results from 32 common cell types in the two studies, we identified 31 consistently differentially expressed genes (DEGs) across seven cell types, including B cells, endothelial cells, epithelial cells, monocytes, NK cells, smooth muscle cells, and T cells. Functional enrichment analysis revealed that these genes are important for the adaptive immune response and autoimmune thyroid diseases. The additional disease-free survival analysis also confirmed that these 31 genes significantly affected patient survival time in large scale thyroid cancer cohort. Furthermore, we experimentally validated one of the top consistent DEGs as a potential biomarker gene of PTC epithelial cells, KRT7, which may be a upstream gene for the NF-κB signaling pathway. The result shows that KRT7 may promote thyroid cancer metastasis through the epithelial-mesenchymal transition and NF-κB signaling pathway. In summary, our single-cell transcriptome integration-based approach may provide insights into the important role of NF-κB in the underlying biology of the PTC.
Collapse
Affiliation(s)
- Xianhui Ruan
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yue Huang
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Lin Geng
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Mengran Tian
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
- School of Medicine, Nankai University, Tianjin, China
- Department of Thyroid and Breast Surgery, Tianjin Key Laboratory of General Surgery in Construction, Tianjin Union Medical Center, Tianjin, China
| | - Yu Liu
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Mei Tao
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Xiangqian Zheng
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Peng Li
- State Key Laboratory of Medicinal Chemical Biology, College of Life Sciences, Nankai University, 300071 Tianjin, China
| | - Min Zhao
- School of Science, Technology and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia
| |
Collapse
|
32
|
Li Y, Zhang D, Yang M, Peng D, Yu J, Liu Y, Lv J, Chen L, Peng X. scBridge embraces cell heterogeneity in single-cell RNA-seq and ATAC-seq data integration. Nat Commun 2023; 14:6045. [PMID: 37770437 PMCID: PMC10539354 DOI: 10.1038/s41467-023-41795-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023] Open
Abstract
Single-cell multi-omics data integration aims to reduce the omics difference while keeping the cell type difference. However, it is daunting to model and distinguish the two differences due to cell heterogeneity. Namely, even cells of the same omics and type would have various features, making the two differences less significant. In this work, we reveal that instead of being an interference, cell heterogeneity could be exploited to improve data integration. Specifically, we observe that the omics difference varies in cells, and cells with smaller omics differences are easier to be integrated. Hence, unlike most existing works that homogeneously treat and integrate all cells, we propose a multi-omics data integration method (dubbed scBridge) that integrates cells in a heterogeneous manner. In brief, scBridge iterates between i) identifying reliable scATAC-seq cells that have smaller omics differences, and ii) integrating reliable scATAC-seq cells with scRNA-seq data to narrow the omics gap, thus benefiting the integration for the rest cells. Extensive experiments on seven multi-omics datasets demonstrate the superiority of scBridge compared with six representative baselines.
Collapse
Affiliation(s)
- Yunfan Li
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Dan Zhang
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Laboratory Medicine, State Key Laboratory of Biotherapy, West China Second University Hospital, Sichuan University, Chengdu, China
| | - Mouxing Yang
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Dezhong Peng
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Jun Yu
- School of Computer Science, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Yu Liu
- School of Electronic and Information Engineering, Naval Aviation University, Yantai, Shandong, China
| | - Jiancheng Lv
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China
| | - Lu Chen
- Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE, Department of Laboratory Medicine, State Key Laboratory of Biotherapy, West China Second University Hospital, Sichuan University, Chengdu, China
| | - Xi Peng
- School of Computer Science, Sichuan University, Chengdu, Sichuan, China.
| |
Collapse
|
33
|
Zhang Z, Mathew D, Lim T, Mason K, Martinez CM, Huang S, Wherry EJ, Susztak K, Minn AJ, Ma Z, Zhang NR. Signal recovery in single cell batch integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539614. [PMID: 37215021 PMCID: PMC10197537 DOI: 10.1101/2023.05.05.539614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be "appropriately" mixed, while preserving "main cell type clusters". We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a "pool-of-controls" design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study.
Collapse
Affiliation(s)
- Zhaojun Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, PA, United States
| | - Divij Mathew
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Tristan Lim
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Kaishu Mason
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, PA, United States
| | - Clara Morral Martinez
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Mark Foundation Center for Immunotherapy, Immune Signaling, and Radiation, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Sijia Huang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - E John Wherry
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Mark Foundation Center for Immunotherapy, Immune Signaling, and Radiation, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Katalin Susztak
- Renal, Electrolyte, and Hypertension Division, Department of Medicine, University of Pennsylvania, Perelman School of Medicine, PA, United States
- Institute for Diabetes, Obesity, and Metabolism, University of Pennsylvania, PA, United States
- Department of Genetics, University of Pennsylvania, PA, United States
| | - Andy J Minn
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Parker Institute for Cancer Immunotherapy, Perelman School of Medicine, University of Pennsylvania, PA, United States
- Mark Foundation Center for Immunotherapy, Immune Signaling, and Radiation, Perelman School of Medicine, University of Pennsylvania, PA, United States
| | - Zongming Ma
- Department of Statistics and Data Science, Yale University, CT, United States
| | - Nancy R Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, PA, United States
| |
Collapse
|
34
|
Wang Y, Sarfraz I, Pervaiz N, Hong R, Koga Y, Akavoor V, Cao X, Alabdullatif S, Zaib SA, Wang Z, Jansen F, Yajima M, Johnson WE, Campbell JD. Interactive analysis of single-cell data using flexible workflows with SCTK2. PATTERNS (NEW YORK, N.Y.) 2023; 4:100814. [PMID: 37602214 PMCID: PMC10436054 DOI: 10.1016/j.patter.2023.100814] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 03/27/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023]
Abstract
Analysis of single-cell RNA sequencing (scRNA-seq) data can reveal novel insights into the heterogeneity of complex biological systems. Many tools and workflows have been developed to perform different types of analyses. However, these tools are spread across different packages or programming environments, rely on different underlying data structures, and can only be utilized by people with knowledge of programming languages. In the Single-Cell Toolkit 2 (SCTK2), we have integrated a variety of popular tools and workflows to perform various aspects of scRNA-seq analysis. All tools and workflows can be run in the R console or using an intuitive graphical user interface built with R/Shiny. HTML reports generated with Rmarkdown can be used to document and recapitulate individual steps or entire analysis workflows. We show that the toolkit offers more features when compared with existing tools and allows for a seamless analysis of scRNA-seq data for non-computational users.
Collapse
Affiliation(s)
- Yichen Wang
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Irzam Sarfraz
- Bioinformatics Program, Boston University, Boston, MA, USA
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Nida Pervaiz
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Rui Hong
- Bioinformatics Program, Boston University, Boston, MA, USA
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Yusuke Koga
- Bioinformatics Program, Boston University, Boston, MA, USA
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Vidya Akavoor
- Software & Application Innovation Lab, Rafik B. Hariri Institute for Computing and Computational Science and Engineering, Boston, MA, USA
| | - Xinyun Cao
- Software & Application Innovation Lab, Rafik B. Hariri Institute for Computing and Computational Science and Engineering, Boston, MA, USA
| | - Salam Alabdullatif
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Syed Ali Zaib
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Zhe Wang
- Bioinformatics Program, Boston University, Boston, MA, USA
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Frederick Jansen
- Software & Application Innovation Lab, Rafik B. Hariri Institute for Computing and Computational Science and Engineering, Boston, MA, USA
| | - Masanao Yajima
- Department of Mathematics and Statistics, Boston University, Boston, MA, USA
| | - W. Evan Johnson
- Bioinformatics Program, Boston University, Boston, MA, USA
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Joshua D. Campbell
- Bioinformatics Program, Boston University, Boston, MA, USA
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| |
Collapse
|
35
|
Lin Y, Cao Y, Willie E, Patrick E, Yang JYH. Atlas-scale single-cell multi-sample multi-condition data integration using scMerge2. Nat Commun 2023; 14:4272. [PMID: 37460600 DOI: 10.1038/s41467-023-39923-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 07/04/2023] [Indexed: 07/20/2023] Open
Abstract
The recent emergence of multi-sample multi-condition single-cell multi-cohort studies allows researchers to investigate different cell states. The effective integration of multiple large-cohort studies promises biological insights into cells under different conditions that individual studies cannot provide. Here, we present scMerge2, a scalable algorithm that allows data integration of atlas-scale multi-sample multi-condition single-cell studies. We have generalized scMerge2 to enable the merging of millions of cells from single-cell studies generated by various single-cell technologies. Using a large COVID-19 data collection with over five million cells from 1000+ individuals, we demonstrate that scMerge2 enables multi-sample multi-condition scRNA-seq data integration from multiple cohorts and reveals signatures derived from cell-type expression that are more accurate in discriminating disease progression. Further, we demonstrate that scMerge2 can remove dataset variability in CyTOF, imaging mass cytometry and CITE-seq experiments, demonstrating its applicability to a broad spectrum of single-cell profiling technologies.
Collapse
Affiliation(s)
- Yingxin Lin
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Yue Cao
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| | - Elijah Willie
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
| | - Ellis Patrick
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- The Westmead Institute for Medical Research, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Jean Y H Yang
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia.
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia.
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia.
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China.
| |
Collapse
|
36
|
Talwar JV, Laub D, Pagadala MS, Castro A, Lewis M, Luebeck GE, Gorman BR, Pan C, Dong FN, Markianos K, Teerlink CC, Lynch J, Hauger R, Pyarajan S, Tsao PS, Morris GP, Salem RM, Thompson WK, Curtius K, Zanetti M, Carter H. Autoimmune alleles at the major histocompatibility locus modify melanoma susceptibility. Am J Hum Genet 2023; 110:1138-1161. [PMID: 37339630 PMCID: PMC10357503 DOI: 10.1016/j.ajhg.2023.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 05/20/2023] [Accepted: 05/22/2023] [Indexed: 06/22/2023] Open
Abstract
Autoimmunity and cancer represent two different aspects of immune dysfunction. Autoimmunity is characterized by breakdowns in immune self-tolerance, while impaired immune surveillance can allow for tumorigenesis. The class I major histocompatibility complex (MHC-I), which displays derivatives of the cellular peptidome for immune surveillance by CD8+ T cells, serves as a common genetic link between these conditions. As melanoma-specific CD8+ T cells have been shown to target melanocyte-specific peptide antigens more often than melanoma-specific antigens, we investigated whether vitiligo- and psoriasis-predisposing MHC-I alleles conferred a melanoma-protective effect. In individuals with cutaneous melanoma from both The Cancer Genome Atlas (n = 451) and an independent validation set (n = 586), MHC-I autoimmune-allele carrier status was significantly associated with a later age of melanoma diagnosis. Furthermore, MHC-I autoimmune-allele carriers were significantly associated with decreased risk of developing melanoma in the Million Veteran Program (OR = 0.962, p = 0.024). Existing melanoma polygenic risk scores (PRSs) did not predict autoimmune-allele carrier status, suggesting these alleles provide orthogonal risk-relevant information. Mechanisms of autoimmune protection were neither associated with improved melanoma-driver mutation association nor improved gene-level conserved antigen presentation relative to common alleles. However, autoimmune alleles showed higher affinity relative to common alleles for particular windows of melanocyte-conserved antigens and loss of heterozygosity of autoimmune alleles caused the greatest reduction in presentation for several conserved antigens across individuals with loss of HLA alleles. Overall, this study presents evidence that MHC-I autoimmune-risk alleles modulate melanoma risk unaccounted for by current PRSs.
Collapse
Affiliation(s)
- James V Talwar
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - David Laub
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Meghana S Pagadala
- Biomedical Science Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Andrea Castro
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - McKenna Lewis
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Georg E Luebeck
- Public Health Sciences Division, Herbold Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Bryan R Gorman
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA 02130, USA; Booz Allen Hamilton, Inc., McLean, VA 22102, USA
| | - Cuiping Pan
- Palo Alto Epidemiology Research and Information Center for Genomics, VA Palo Alto, CA, USA
| | - Frederick N Dong
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA 02130, USA; Booz Allen Hamilton, Inc., McLean, VA 22102, USA
| | - Kyriacos Markianos
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA 02130, USA; Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Boston, MA 02115, USA; Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02115, USA
| | - Craig C Teerlink
- Department of Veterans Affairs Informatics and Computing Infrastructure (VINCI), VA Salt Lake City Healthcare System, Salt Lake City, UT, USA; Department of Internal Medicine, Division of Epidemiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Julie Lynch
- Department of Veterans Affairs Informatics and Computing Infrastructure (VINCI), VA Salt Lake City Healthcare System, Salt Lake City, UT, USA; Department of Internal Medicine, Division of Epidemiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Richard Hauger
- VA San Diego Healthcare System, La Jolla, CA, USA; Center for Behavioral Genetics of Aging, University of California San Diego, La Jolla, CA, USA; Center of Excellence for Stress and Mental Health (CESAMH), VA San Diego Healthcare System, San Diego, CA, USA
| | - Saiju Pyarajan
- Center for Data and Computational Sciences (C-DACS), VA Boston Healthcare System, Boston, MA 02130, USA; Department of Medicine, Brigham Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Philip S Tsao
- Palo Alto Epidemiology Research and Information Center for Genomics, VA Palo Alto, CA, USA; Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Gerald P Morris
- Department of Pathology, University of California San Diego, La Jolla, CA 92093, USA
| | - Rany M Salem
- Division of Epidemiology, Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA 92093, USA
| | - Wesley K Thompson
- Center for Population Neuroscience and Genetics, Laureate Institute for Brain Research, Tulsa, OK 74136, USA
| | - Kit Curtius
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA; Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Maurizio Zanetti
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA; The Laboratory of Immunology, University of California San Diego, La Jolla, CA 92093, USA; Department of Medicine, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
37
|
Zhu D, Vernon ST, D'Agostino Z, Wu J, Giles C, Chan AS, Kott KA, Gray MP, Gholipour A, Tang O, Beyene HB, Patrick E, Grieve SM, Meikle PJ, Figtree GA, Yang JYH. Lipidomics Profiling and Risk of Coronary Artery Disease in the BioHEART-CT Discovery Cohort. Biomolecules 2023; 13:917. [PMID: 37371497 DOI: 10.3390/biom13060917] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 05/25/2023] [Accepted: 05/29/2023] [Indexed: 06/29/2023] Open
Abstract
The current coronary artery disease (CAD) risk scores for predicting future cardiovascular events rely on well-recognized traditional cardiovascular risk factors derived from a population level but often fail individuals, with up to 25% of first-time heart attack patients having no risk factors. Non-invasive imaging technology can directly measure coronary artery plaque burden. With an advanced lipidomic measurement methodology, for the first time, we aim to identify lipidomic biomarkers to enable intervention before cardiovascular events. With 994 participants from BioHEART-CT Discovery Cohort, we collected clinical data and performed high-performance liquid chromatography with mass spectrometry to determine concentrations of 683 plasma lipid species. Statin-naive participants were selected based on subclinical CAD (sCAD) categories as the analytical cohort (n = 580), with sCAD+ (n = 243) compared to sCAD- (n = 337). Through a machine learning approach, we built a lipid risk score (LRS) and compared the performance of the existing Framingham Risk Score (FRS) in predicting sCAD+. We obtained individual classifiability scores and determined Body Mass Index (BMI) as the modifying variable. FRS and LRS models achieved similar areas under the receiver operating characteristic curve (AUC) in predicting the validation cohort. LRS enhanced the prediction of sCAD+ in the healthy-weight group (BMI < 25 kg/m2), where FRS performed poorly and identified individuals at risk that FRS missed. Lipid features have strong potential as biomarkers to predict CAD plaque burden and can identify residual risk not captured by traditional risk factors/scores. LRS compliments FRS in prediction and has the most significant benefit in healthy-weight individuals.
Collapse
Affiliation(s)
- Dantong Zhu
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
- Kolling Institute of Medical Research, The University of Sydney, Sydney, NSW 2065, Australia
| | - Stephen T Vernon
- Kolling Institute of Medical Research, The University of Sydney, Sydney, NSW 2065, Australia
- Department of Cardiology, Royal North Shore Hospital, Sydney, NSW 2065, Australia
| | - Zac D'Agostino
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Jingqin Wu
- Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Corey Giles
- Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Adam S Chan
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Katharine A Kott
- Kolling Institute of Medical Research, The University of Sydney, Sydney, NSW 2065, Australia
- Department of Cardiology, Royal North Shore Hospital, Sydney, NSW 2065, Australia
| | - Michael P Gray
- Kolling Institute of Medical Research, The University of Sydney, Sydney, NSW 2065, Australia
| | - Alireza Gholipour
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| | - Owen Tang
- Kolling Institute of Medical Research, The University of Sydney, Sydney, NSW 2065, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| | - Habtamu B Beyene
- Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Ellis Patrick
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Stuart M Grieve
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| | - Peter J Meikle
- Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- Department of Cardiovascular Research Translation and Implementation, La Trobe University, Melbourne, VIC 3086, Australia
| | - Gemma A Figtree
- Kolling Institute of Medical Research, The University of Sydney, Sydney, NSW 2065, Australia
- Department of Cardiology, Royal North Shore Hospital, Sydney, NSW 2065, Australia
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
38
|
Zhu J, Yang Y. scMEB: a fast and clustering-independent method for detecting differentially expressed genes in single-cell RNA-seq data. BMC Genomics 2023; 24:280. [PMID: 37231345 DOI: 10.1186/s12864-023-09374-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Accepted: 05/11/2023] [Indexed: 05/27/2023] Open
Abstract
BACKGROUND Cell clustering is a prerequisite for identifying differentially expressed genes (DEGs) in single-cell RNA sequencing (scRNA-seq) data. Obtaining a perfect clustering result is of central importance for subsequent analyses, but not easy. Additionally, the increase in cell throughput due to the advancement of scRNA-seq protocols exacerbates many computational issues, especially regarding method runtime. To address these difficulties, a new, accurate, and fast method for detecting DEGs in scRNA-seq data is needed. RESULTS Here, we propose single-cell minimum enclosing ball (scMEB), a novel and fast method for detecting single-cell DEGs without prior cell clustering results. The proposed method utilizes a small part of known non-DEGs (stably expressed genes) to build a minimum enclosing ball and defines the DEGs based on the distance of a mapped gene to the center of the hypersphere in a feature space. CONCLUSIONS We compared scMEB to two different approaches that could be used to identify DEGs without cell clustering. The investigation of 11 real datasets revealed that scMEB outperformed rival methods in terms of cell clustering, predicting genes with biological functions, and identifying marker genes. Moreover, scMEB was much faster than the other methods, making it particularly effective for finding DEGs in high-throughput scRNA-seq data. We have developed a package scMEB for the proposed method, which could be available at https://github.com/FocusPaka/scMEB .
Collapse
Affiliation(s)
- Jiadi Zhu
- School of Mathematics and Statistics, Xidian University, Xi'an, China
| | - Youlong Yang
- School of Mathematics and Statistics, Xidian University, Xi'an, China.
| |
Collapse
|
39
|
Vasaikar SV, Savage AK, Gong Q, Swanson E, Talla A, Lord C, Heubeck AT, Reading J, Graybuck LT, Meijer P, Torgerson TR, Skene PJ, Bumol TF, Li XJ. A comprehensive platform for analyzing longitudinal multi-omics data. Nat Commun 2023; 14:1684. [PMID: 36973282 PMCID: PMC10041512 DOI: 10.1038/s41467-023-37432-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/17/2023] [Indexed: 03/29/2023] Open
Abstract
Longitudinal bulk and single-cell omics data is increasingly generated for biological and clinical research but is challenging to analyze due to its many intrinsic types of variations. We present PALMO ( https://github.com/aifimmunology/PALMO ), a platform that contains five analytical modules to examine longitudinal bulk and single-cell multi-omics data from multiple perspectives, including decomposition of sources of variations within the data, collection of stable or variable features across timepoints and participants, identification of up- or down-regulated markers across timepoints of individual participants, and investigation on samples of same participants for possible outlier events. We have tested PALMO performance on a complex longitudinal multi-omics dataset of five data modalities on the same samples and six external datasets of diverse background. Both PALMO and our longitudinal multi-omics dataset can be valuable resources to the scientific community.
Collapse
Affiliation(s)
| | - Adam K Savage
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | - Qiuyu Gong
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | - Elliott Swanson
- Allen Institute for Immunology, Seattle, WA, 98109, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Aarthi Talla
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | - Cara Lord
- Allen Institute for Immunology, Seattle, WA, 98109, USA
- GlaxoSmithKline, Collegeville, PA, 19426, USA
| | | | | | | | - Paul Meijer
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | | | - Peter J Skene
- Allen Institute for Immunology, Seattle, WA, 98109, USA
| | | | - Xiao-Jun Li
- Allen Institute for Immunology, Seattle, WA, 98109, USA.
| |
Collapse
|
40
|
Nguyen HCT, Baik B, Yoon S, Park T, Nam D. Benchmarking integration of single-cell differential expression. Nat Commun 2023; 14:1570. [PMID: 36944632 PMCID: PMC10030080 DOI: 10.1038/s41467-023-37126-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 03/03/2023] [Indexed: 03/23/2023] Open
Abstract
Integration of single-cell RNA sequencing data between different samples has been a major challenge for analyzing cell populations. However, strategies to integrate differential expression analysis of single-cell data remain underinvestigated. Here, we benchmark 46 workflows for differential expression analysis of single-cell data with multiple batches. We show that batch effects, sequencing depth and data sparsity substantially impact their performances. Notably, we find that the use of batch-corrected data rarely improves the analysis for sparse data, whereas batch covariate modeling improves the analysis for substantial batch effects. We show that for low depth data, single-cell techniques based on zero-inflation model deteriorate the performance, whereas the analysis of uncorrected data using limmatrend, Wilcoxon test and fixed effects model performs well. We suggest several high-performance methods under different conditions based on various simulation and real data analyses. Additionally, we demonstrate that differential expression analysis for a specific cell type outperforms that of large-scale bulk sample data in prioritizing disease-related genes.
Collapse
Affiliation(s)
- Hai C T Nguyen
- Department of Biological Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Bukyung Baik
- Department of Biological Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
| | - Sora Yoon
- Department of Biological Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dougu Nam
- Department of Biological Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea.
- Department of Mathematical Sciences, Ulsan National Institute of Science and Technology, Ulsan, 44919, Republic of Korea.
| |
Collapse
|
41
|
Hao J, Zou J, Zhang J, Chen K, Wu D, Cao W, Shang G, Yang JYH, Wong-Lin K, Sun H, Zhang Z, Wang X, Chen W, Zou X. scSTAR reveals hidden heterogeneity with a real-virtual cell pair structure across conditions in single-cell RNA sequencing data. Brief Bioinform 2023; 24:bbad062. [PMID: 36813563 DOI: 10.1093/bib/bbad062] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 12/29/2022] [Accepted: 02/02/2023] [Indexed: 02/24/2023] Open
Abstract
Cell-state transition can reveal additional information from single-cell ribonucleic acid (RNA)-sequencing data in time-resolved biological phenomena. However, most of the current methods are based on the time derivative of the gene expression state, which restricts them to the short-term evolution of cell states. Here, we present single-cell State Transition Across-samples of RNA-seq data (scSTAR), which overcomes this limitation by constructing a paired-cell projection between biological conditions with an arbitrary time span by maximizing the covariance between two feature spaces using partial least square and minimum squared error methods. In mouse ageing data, the response to stress in CD4+ memory T cell subtypes was found to be associated with ageing. A novel Treg subtype characterized by mTORC activation was identified to be associated with antitumour immune suppression, which was confirmed by immunofluorescence microscopy and survival analysis in 11 cancers from The Cancer Genome Atlas Program. On melanoma data, scSTAR improved immunotherapy-response prediction accuracy from 0.8 to 0.96.
Collapse
Affiliation(s)
- Jie Hao
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Jiawei Zou
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Jiaqiang Zhang
- Department of Anesthesiology and Perioperative Medicine, Henan Provincial People's Hospital, People's Hospital of Zhengzhou University, Zhengzhou, Henan, 450003, China
| | - Ke Chen
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Duojiao Wu
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Wei Cao
- Department of Oral Maxillofacial-Head and Neck Oncology, Ninth People's Hospital, Shanghai Key Laboratory of Stomatology & Shanghai Research Institute of Stomatology, National Clinical Research Center of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, 200011, China
| | - Guoguo Shang
- Department of Pathology of Zhongshan Hospital, Fudan University, Shanghai, China
| | - Jean Y H Yang
- School of Mathematics and Statistics and Charles Perkins Center, The University of Sydney, Australia
| | - KongFatt Wong-Lin
- Intelligent Systems Research Centre, Ulster University, Magee Campus, Derry~Londonderry, Northern Ireland, UK
| | - Hourong Sun
- Department of Cardiac Surgery, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan City, Shandong, 250012, China
| | - Zhen Zhang
- Ninth People's Hospital, Shanghai Key Laboratory of Stomatology & Shanghai Research Institute of Stomatology, National Clinical Research Center of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, 200011, China
| | - Xiangdong Wang
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Wantao Chen
- Ninth People's Hospital, Shanghai Key Laboratory of Stomatology & Shanghai Research Institute of Stomatology, National Clinical Research Center of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, 200011, China
| | - Xin Zou
- Jinshan Hospital Center for Tumor Diagnosis & Therapy, Jinshan Hospital, Fudan University, Shanghai, 201508, China
| |
Collapse
|
42
|
Wang Y, Lê Cao KA. PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data. Brief Bioinform 2023; 24:bbac622. [PMID: 36653900 PMCID: PMC10025448 DOI: 10.1093/bib/bbac622] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/14/2022] [Accepted: 12/17/2022] [Indexed: 01/20/2023] Open
Abstract
Microbial communities are highly dynamic and sensitive to changes in the environment. Thus, microbiome data are highly susceptible to batch effects, defined as sources of unwanted variation that are not related to and obscure any factors of interest. Existing batch effect correction methods have been primarily developed for gene expression data. As such, they do not consider the inherent characteristics of microbiome data, including zero inflation, overdispersion and correlation between variables. We introduce new multivariate and non-parametric batch effect correction methods based on Partial Least Squares Discriminant Analysis (PLSDA). PLSDA-batch first estimates treatment and batch variation with latent components, then subtracts batch-associated components from the data. The resulting batch-effect-corrected data can then be input in any downstream statistical analysis. Two variants are proposed to handle unbalanced batch x treatment designs and to avoid overfitting when estimating the components via variable selection. We compare our approaches with popular methods managing batch effects, namely, removeBatchEffect, ComBat and Surrogate Variable Analysis, in simulated and three case studies using various visual and numerical assessments. We show that our three methods lead to competitive performance in removing batch variation while preserving treatment variation, especially for unbalanced batch $\times $ treatment designs. Our downstream analyses show selections of biologically relevant taxa. This work demonstrates that batch effect correction methods can improve microbiome research outputs. Reproducible code and vignettes are available on GitHub.
Collapse
Affiliation(s)
- Yiwen Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 97 Buxin Rd, Shenzhen, 518000, Guangdong, China
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, 30 Royal Parade, Melbourne, 3052, VIC, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, 30 Royal Parade, Melbourne, 3052, VIC, Australia
| |
Collapse
|
43
|
Yan X, Zheng R, Wu F, Li M. CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity. Bioinformatics 2023; 39:7055295. [PMID: 36821425 PMCID: PMC9985174 DOI: 10.1093/bioinformatics/btad099] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 12/27/2022] [Accepted: 02/22/2023] [Indexed: 02/24/2023] Open
Abstract
MOTIVATION Integration of growing single-cell RNA sequencing datasets helps better understand cellular identity and function. The major challenge for integration is removing batch effects while preserving biological heterogeneities. Advances in contrastive learning have inspired several contrastive learning-based batch correction methods. However, existing contrastive-learning-based methods exhibit noticeable ad hoc trade-off between batch mixing and preservation of cellular heterogeneities (mix-heterogeneity trade-off). Therefore, a deliberate mix-heterogeneity trade-off is expected to yield considerable improvements in scRNA-seq dataset integration. RESULTS We develop a novel contrastive learning-based batch correction framework, CIAIRE, which achieves superior mix-heterogeneity trade-off. The key contributions of CLAIRE are proposal of two complementary strategies: construction strategy and refinement strategy, to improve the appropriateness of positive pairs. Construction strategy dynamically generates positive pairs by augmenting inter-batch mutual nearest neighbors (MNN) with intra-batch k-nearest neighbors (KNN), which improves the coverage of positive pairs for the whole distribution of shared cell types between batches. Refinement strategy aims to automatically reduce the potential false positive pairs from the construction strategy, which resorts to the memory effect of deep neural networks. We demonstrate that CLAIRE possesses superior mix-heterogeneity trade-off over existing contrastive learning-based methods. Benchmark results on six real datasets also show that CLAIRE achieves the best integration performance against eight state-of-the-art methods. Finally, comprehensive experiments are conducted to validate the effectiveness of CLAIRE. AVAILABILITY AND IMPLEMENTATION The source code and data used in this study can be found in https://github.com/CSUBioGroup/CLAIRE-release. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xuhua Yan
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Fangxiang Wu
- Division of Biomedical Engineering, Department of Computer Science, Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
44
|
Ryu Y, Han GH, Jung E, Hwang D. Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods. Mol Cells 2023; 46:106-119. [PMID: 36859475 PMCID: PMC9982060 DOI: 10.14348/molcells.2023.0009] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 01/19/2023] [Accepted: 01/19/2023] [Indexed: 03/03/2023] Open
Abstract
With the increased number of single-cell RNA sequencing (scRNA-seq) datasets in public repositories, integrative analysis of multiple scRNA-seq datasets has become commonplace. Batch effects among different datasets are inevitable because of differences in cell isolation and handling protocols, library preparation technology, and sequencing platforms. To remove these batch effects for effective integration of multiple scRNA-seq datasets, a number of methodologies have been developed based on diverse concepts and approaches. These methods have proven useful for examining whether cellular features, such as cell subpopulations and marker genes, identified from a certain dataset, are consistently present, or whether their condition-dependent variations, such as increases in cell subpopulations in particular disease-related conditions, are consistently observed in different datasets generated under similar or distinct conditions. In this review, we summarize the concepts and approaches of the integration methods and their pros and cons as has been reported in previous literature.
Collapse
Affiliation(s)
- Yeonjae Ryu
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Geun Hee Han
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Eunsoo Jung
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| | - Daehee Hwang
- School of Biological Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
45
|
Khan SA, Lehmann R, Martinez-de-Morentin X, Maillo A, Lagani V, Kiani NA, Gomez-Cabrero D, Tegner J. scAEGAN: Unification of single-cell genomics data by adversarial learning of latent space correspondences. PLoS One 2023; 18:e0281315. [PMID: 36735690 PMCID: PMC9897517 DOI: 10.1371/journal.pone.0281315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 01/19/2023] [Indexed: 02/04/2023] Open
Abstract
Recent progress in Single-Cell Genomics has produced different library protocols and techniques for molecular profiling. We formulate a unifying, data-driven, integrative, and predictive methodology for different libraries, samples, and paired-unpaired data modalities. Our design of scAEGAN includes an autoencoder (AE) network integrated with adversarial learning by a cycleGAN (cGAN) network. The AE learns a low-dimensional embedding of each condition, whereas the cGAN learns a non-linear mapping between the AE representations. We evaluate scAEGAN using simulated data and real scRNA-seq datasets, different library preparations (Fluidigm C1, CelSeq, CelSeq2, SmartSeq), and several data modalities as paired scRNA-seq and scATAC-seq. The scAEGAN outperforms Seurat3 in library integration, is more robust against data sparsity, and beats Seurat 4 in integrating paired data from the same cell. Furthermore, in predicting one data modality from another, scAEGAN outperforms Babel. We conclude that scAEGAN surpasses current state-of-the-art methods and unifies integration and prediction challenges.
Collapse
Affiliation(s)
- Sumeer Ahmad Khan
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Robert Lehmann
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Xabier Martinez-de-Morentin
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
| | - Alberto Maillo
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Vincenzo Lagani
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Narsis A. Kiani
- Department of Oncology and Pathology, Algorithmic Dynamic Lab, Karolinska Institute, Stockholm, Sweden
- Department of Medicine, Unit of Computational Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
| | - David Gomez-Cabrero
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
- Mucosal and Salivary Biology Division, King’s College London Dental Institute, London, United Kingdom
| | - Jesper Tegner
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Department of Medicine, Unit of Computational Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Science for Life Laboratory, Solna, Sweden
| |
Collapse
|
46
|
Liu W, Liao X, Luo Z, Yang Y, Lau MC, Jiao Y, Shi X, Zhai W, Ji H, Yeong J, Liu J. Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nat Commun 2023; 14:296. [PMID: 36653349 PMCID: PMC9849443 DOI: 10.1038/s41467-023-35947-w] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 01/09/2023] [Indexed: 01/19/2023] Open
Abstract
Spatially resolved transcriptomics involves a set of emerging technologies that enable the transcriptomic profiling of tissues with the physical location of expressions. Although a variety of methods have been developed for data integration, most of them are for single-cell RNA-seq datasets without consideration of spatial information. Thus, methods that can integrate spatial transcriptomics data from multiple tissue slides, possibly from multiple individuals, are needed. Here, we present PRECAST, a data integration method for multiple spatial transcriptomics datasets with complex batch effects and/or biological effects between slides. PRECAST unifies spatial factor analysis simultaneously with spatial clustering and embedding alignment, while requiring only partially shared cell/domain clusters across datasets. Using both simulated and four real datasets, we show improved cell/domain detection with outstanding visualization, and the estimated aligned embeddings and cell/domain labels facilitate many downstream analyses. We demonstrate that PRECAST is computationally scalable and applicable to spatial transcriptomics datasets from different platforms.
Collapse
Affiliation(s)
- Wei Liu
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Xu Liao
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Ziye Luo
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore
- School of Statistics, Renmin University, Beijing, China
| | - Yi Yang
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Mai Chan Lau
- Institute of Molecular and Cell Biology (IMCB), Agency of Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Yuling Jiao
- School of Mathematics and Statistics, Wuhan University, Wuhan, China
| | - Xingjie Shi
- Academy of Statistics and Interdisciplinary Sciences, East China Normal University, Shanghai, China
| | - Weiwei Zhai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Hongkai Ji
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Joe Yeong
- Institute of Molecular and Cell Biology (IMCB), Agency of Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Anatomical Pathology, Singapore General Hospital, Singapore, Singapore
| | - Jin Liu
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore, Singapore.
- School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen, China.
| |
Collapse
|
47
|
Gan D, Li J. SCIBER: a simple method for removing batch effects from single-cell RNA-sequencing data. Bioinformatics 2023; 39:6957084. [PMID: 36548380 PMCID: PMC9848058 DOI: 10.1093/bioinformatics/btac819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/27/2022] [Accepted: 12/21/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Integrative analysis of multiple single-cell RNA-sequencing datasets allows for more comprehensive characterizations of cell types, but systematic technical differences between datasets, known as 'batch effects', need to be removed before integration to avoid misleading interpretation of the data. Although many batch-effect-removal methods have been developed, there is still a large room for improvement: most existing methods only give dimension-reduced data instead of expression data of individual genes, are based on computationally demanding models and are black-box models and thus difficult to interpret or tune. RESULTS Here, we present a new batch-effect-removal method called SCIBER (Single-Cell Integrator and Batch Effect Remover) and study its performance on real datasets. SCIBER matches cell clusters across batches according to the overlap of their differentially expressed genes. As a simple algorithm that has better scalability to data with a large number of cells and is easy to tune, SCIBER shows comparable and sometimes better accuracy in removing batch effects on real datasets compared to the state-of-the-art methods, which are much more complicated. Moreover, SCIBER outputs expression data in the original space, that is, the expression of individual genes, which can be used directly for downstream analyses. Additionally, SCIBER is a reference-based method, which assigns one of the batches as the reference batch and keeps it untouched during the process, making it especially suitable for integrating user-generated datasets with standard reference data such as the Human Cell Atlas. AVAILABILITY AND IMPLEMENTATION SCIBER is publicly available as an R package on CRAN: https://cran.r-project.org/web/packages/SCIBER/. A vignette is included in the CRAN R package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dailin Gan
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Jun Li
- To whom correspondence should be addressed.
| |
Collapse
|
48
|
Zhi Y, Li M, Lv G. Into the multi-omics era: Progress of T cells profiling in the context of solid organ transplantation. Front Immunol 2023; 14:1058296. [PMID: 36798139 PMCID: PMC9927650 DOI: 10.3389/fimmu.2023.1058296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/20/2023] [Indexed: 02/04/2023] Open
Abstract
T cells are the common type of lymphocyte to mediate allograft rejection, remaining long-term allograft survival impeditive. However, the heterogeneity of T cells, in terms of differentiation and activation status, the effector function, and highly diverse T cell receptors (TCRs) have thus precluded us from tracking these T cells and thereby comprehending their fate in recipients due to the limitations of traditional detection approaches. Recently, with the widespread development of single-cell techniques, the identification and characterization of T cells have been performed at single-cell resolution, which has contributed to a deeper comprehension of T cell heterogeneity by relevant detections in a single cell - such as gene expression, DNA methylation, chromatin accessibility, surface proteins, and TCR. Although these approaches can provide valuable insights into an individual cell independently, a comprehensive understanding can be obtained when applied joint analysis. Multi-omics techniques have been implemented in characterizing T cells in health and disease, including transplantation. This review focuses on the thesis, challenges, and advances in these technologies and highlights their application to the study of alloreactive T cells to improve the understanding of T cell heterogeneity in solid organ transplantation.
Collapse
Affiliation(s)
- Yao Zhi
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| | - Mingqian Li
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| | - Guoyue Lv
- Department of Hepatobiliary and Pancreatic Surgery, The First Hospital of Jilin University, Changchun, China
| |
Collapse
|
49
|
Kujawa T, Marczyk M, Polanska J. Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis. Front Genet 2022; 13:1009316. [PMID: 36386846 PMCID: PMC9663917 DOI: 10.3389/fgene.2022.1009316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/13/2022] [Indexed: 12/02/2022] Open
Abstract
Large-scale comprehensive single-cell experiments are often resource-intensive and require the involvement of many laboratories and/or taking measurements at various times. This inevitably leads to batch effects, and systematic variations in the data that might occur due to different technology platforms, reagent lots, or handling personnel. Such technical differences confound biological variations of interest and need to be corrected during the data integration process. Data integration is a challenging task due to the overlapping of biological and technical factors, which makes it difficult to distinguish their individual contribution to the overall observed effect. Moreover, the choice of integration method may impact the downstream analyses, including searching for differentially expressed genes. From the existing data integration methods, we selected only those that return the full expression matrix. We evaluated six methods in terms of their influence on the performance of differential gene expression analysis in two single-cell datasets with the same biological study design that differ only in the way the measurement was done: one dataset manifests strong batch effects due to the measurements of each sample at a different time. Integrated data were visualized using the UMAP method. The evaluation was done both on individual gene level using parametric and non-parametric approaches for finding differentially expressed genes and on gene set level using gene set enrichment analysis. As an evaluation metric, we used two correlation coefficients, Pearson and Spearman, of the obtained test statistics between reference, test, and corrected studies. Visual comparison of UMAP plots highlighted ComBat-seq, limma, and MNN, which reduced batch effects and preserved differences between biological conditions. Most of the tested methods changed the data distribution after integration, which negatively impacts the use of parametric methods for the analysis. Two algorithms, MNN and Scanorama, gave very poor results in terms of differential analysis on gene and gene set levels. Finally, we highlight ComBat-seq as it led to the highest correlation of test statistics between reference and corrected dataset among others. Moreover, it does not distort the original distribution of gene expression data, so it can be used in all types of downstream analyses.
Collapse
Affiliation(s)
- Tomasz Kujawa
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Michał Marczyk
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
- Yale Cancer Center, Yale School of Medicine, New Haven, CT, United States
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
- *Correspondence: Joanna Polanska,
| |
Collapse
|
50
|
Cao Y, Lin Y, Patrick E, Yang P, Yang JYH. scFeatures: multi-view representations of single-cell and spatial data for disease outcome prediction. Bioinformatics 2022; 38:4745-4753. [PMID: 36040148 PMCID: PMC9563679 DOI: 10.1093/bioinformatics/btac590] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/21/2022] [Accepted: 08/28/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION With the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals). RESULTS Here, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarizing a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals. AVAILABILITY AND IMPLEMENTATION scFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Section 2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yue Cao
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Yingxin Lin
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Ellis Patrick
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
- Computational Systems Biology Group, Children’s Medical Research Institute, Westmead, NSW 2145, Australia
| | - Pengyi Yang
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
- Computational Systems Biology Group, Children’s Medical Research Institute, Westmead, NSW 2145, Australia
| | - Jean Yee Hwa Yang
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
| |
Collapse
|