1
|
Logotheti S, Pavlopoulou A, Rudsari HK, Galow AM, Kafali Y, Kyrodimos E, Giotakis AI, Marquardt S, Velalopoulou A, Verginadis II, Koumenis C, Stiewe T, Zoidakis J, Balasingham I, David R, Georgakilas AG. Intercellular pathways of cancer treatment-related cardiotoxicity and their therapeutic implications: The paradigm of radiotherapy. Pharmacol Ther 2024:108670. [PMID: 38823489 DOI: 10.1016/j.pharmthera.2024.108670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 05/16/2024] [Accepted: 05/25/2024] [Indexed: 06/03/2024]
Abstract
Advances in cancer therapeutics have improved patient survival rates. However, cancer survivors may suffer from adverse events either at the time of therapy or later in life. Cardiovascular diseases (CVD) represent a clinically important, but mechanistically understudied complication, which interfere with the continuation of best-possible care, induce life-threatening risks, and/or lead to long-term morbidity. These concerns are exacerbated by the fact that targeted therapies and immunotherapies are frequently combined with radiotherapy, which induces durable inflammatory and immunogenic responses, thereby providing a fertile ground for the development of cardiovascular diseases (CVDs). Stressed and dying irradiated cells produce 'danger' signals including, but not limited to, major histocompatibility complexes, cell-adhesion molecules, proinflammatory cytokines, and damage-associated molecular patterns. These factors activate intercellular signaling pathways which have potentially detrimental effects on the heart tissue homeostasis. Herein, we present the clinical crosstalk between cancer and heart diseases, describe how it is potentiated by cancer therapies, and highlight the multifactorial nature of the underlying mechanisms. We particularly focus on radiotherapy, as a case known to often induce cardiovascular complications even decades after treatment. We provide evidence that the secretome of irradiated tumors entails factors that exert systemic, remote effects on the cardiac tissue, potentially predisposing it to CVDs. We suggest how diverse disciplines can utilize pertinent state-of-the-art methods in feasible experimental workflows, to shed light on the molecular mechanisms of radiotherapy-related cardiotoxicity at the organismal level and untangle the desirable immunogenic properties of cancer therapies from their detrimental effects on heart tissue. Results of such highly collaborative efforts hold promise to be translated to next-generation regimens that maximize tumor control, minimize cardiovascular complications, and support quality of life in cancer survivors.
Collapse
Affiliation(s)
- Stella Logotheti
- DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780, Athens, Greece.
| | - Athanasia Pavlopoulou
- Izmir Biomedicine and Genome Center, Izmir, Turkey; Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Izmir, Turkey
| | | | - Anne-Marie Galow
- Institute for Genome Biology, Research Institute for Farm Animal Biology (FBN), 18196 Dummerstorf, Germany
| | - Yağmur Kafali
- Izmir Biomedicine and Genome Center, Izmir, Turkey; Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Izmir, Turkey
| | - Efthymios Kyrodimos
- First Department of Otorhinolaryngology, Head and Neck Surgery, Hippocrateion General Hospital Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Aris I Giotakis
- First Department of Otorhinolaryngology, Head and Neck Surgery, Hippocrateion General Hospital Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Stephan Marquardt
- Institute of Translational Medicine for Health Care Systems, Medical School Berlin, Hochschule Für Gesundheit Und Medizin, 14197 Berlin, Germany
| | - Anastasia Velalopoulou
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ioannis I Verginadis
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Constantinos Koumenis
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Thorsten Stiewe
- Institute of Molecular Oncology, Philipps-University, 35043 Marburg, Germany; German Center for Lung Research (DZL), Universities of Giessen and Marburg Lung Center (UGMLC), 35043 Marburg, Germany; Genomics Core Facility, Philipps-University, 35043 Marburg, Germany; Institute for Lung Health (ILH), Justus Liebig University, 35392 Giessen, Germany
| | - Jerome Zoidakis
- Department of Biotechnology, Biomedical Research Foundation, Academy of Athens, Athens, Greece; Department of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | | | - Robert David
- Department of Cardiac Surgery, Rostock University Medical Center, 18057 Rostock, Germany; Department of Life, Light & Matter, Interdisciplinary Faculty, Rostock University, 18059 Rostock, Germany
| | - Alexandros G Georgakilas
- DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780, Athens, Greece.
| |
Collapse
|
2
|
Xie AX, Tansey W, Reznik E. UnitedMet harnesses RNA-metabolite covariation to impute metabolite levels in clinical samples. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.24.24307903. [PMID: 38826234 PMCID: PMC11142294 DOI: 10.1101/2024.05.24.24307903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Comprehensively studying metabolism requires the measurement of metabolite levels. However, in contrast to the broad availability of gene expression data, metabolites are rarely measured in large molecularly-defined cohorts of tissue samples. To address this basic barrier to metabolic discovery, we propose a Bayesian framework ("UnitedMet") which leverages the empirical strength of RNA-metabolite covariation to impute otherwise unmeasured metabolite levels from widely available transcriptomic data. We demonstrate that UnitedMet is equally capable of imputing whole pool sizes as well as the outcomes of isotope tracing experiments. We apply UnitedMet to investigate the metabolic impact of driver mutations in kidney cancer, identifying a novel association between BAP1 and a highly oxidative tumor phenotype. We similarly apply UnitedMet to determine that advanced kidney cancers upregulate oxidative phosphorylation relative to early-stage disease, that oxidative metabolism in kidney cancer is associated with inferior outcomes to combination therapy, and that kidney cancer metastases themselves demonstrate elevated oxidative phosphorylation relative to primary tumors. UnitedMet therefore enables the assessment of metabolic phenotypes in contexts where metabolite measurements were not taken or are otherwise infeasible, opening new avenues for the generation and evaluation of metabolite-centered hypotheses. UnitedMet is open source and publicly available ( https://github.com/reznik-lab/UnitedMet ).
Collapse
|
3
|
Zhou M, Zhang H, Bai Z, Mann-Krzisnik D, Wang F, Li Y. Protocol to perform integrative analysis of high-dimensional single-cell multimodal data using an interpretable deep learning technique. STAR Protoc 2024; 5:103066. [PMID: 38748882 PMCID: PMC11109308 DOI: 10.1016/j.xpro.2024.103066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 11/21/2023] [Accepted: 04/24/2024] [Indexed: 05/25/2024] Open
Abstract
The advent of single-cell multi-omics sequencing technology makes it possible for researchers to leverage multiple modalities for individual cells. Here, we present a protocol to perform integrative analysis of high-dimensional single-cell multimodal data using an interpretable deep learning technique called moETM. We describe steps for data preprocessing, multi-omics integration, inclusion of prior pathway knowledge, and cross-omics imputation. As a demonstration, we used the single-cell multi-omics data collected from bone marrow mononuclear cells (GSE194122) as in our original study. For complete details on the use and execution of this protocol, please refer to Zhou et al.1.
Collapse
Affiliation(s)
- Manqi Zhou
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA
| | - Hao Zhang
- Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | - Zilong Bai
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA; Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | | | - Fei Wang
- Institute of Artificial Intelligence for Digital Health, Weill Cornell Medicine, New York, NY 10021, USA; Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10021, USA
| | - Yue Li
- Quantitative Life Science, McGill University, Montréal, QC H3A 0G4, Canada; School of Computer Science, McGill University, Montréal, QC H3A 0G4, Canada; Mila - Quebec AI Institute, Montréal, QC H2S 3H1, Canada.
| |
Collapse
|
4
|
Lotfollahi M, Yuhan Hao, Theis FJ, Satija R. The future of rapid and automated single-cell data analysis using reference mapping. Cell 2024; 187:2343-2358. [PMID: 38729109 DOI: 10.1016/j.cell.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 05/12/2024]
Abstract
As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK; Department of Mathematics, Technical University of Munich, Garching, Germany.
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA.
| |
Collapse
|
5
|
Wang H, Liu Z, Ma X. Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data. IEEE J Biomed Health Inform 2024; 28:3134-3145. [PMID: 38709615 DOI: 10.1109/jbhi.2024.3370868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.
Collapse
|
6
|
Cai H, Huang W, Yang S, Ding S, Zhang Y, Hu B, Zhang F, Cheung YM. Realize Generative Yet Complete Latent Representation for Incomplete Multi-View Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3637-3652. [PMID: 38145535 DOI: 10.1109/tpami.2023.3346869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
In multi-view environment, it would yield missing observations due to the limitation of the observation process. The most current representation learning methods struggle to explore complete information by lacking either cross-generative via simply filling in missing view data, or solidative via inferring a consistent representation among the existing views. To address this problem, we propose a deep generative model to learn a complete generative latent representation, namely Complete Multi-view Variational Auto-Encoders (CMVAE), which models the generation of the multiple views from a complete latent variable represented by a mixture of Gaussian distributions. Thus, the missing view can be fully characterized by the latent variables and is resolved by estimating its posterior distribution. Accordingly, a novel variational lower bound is introduced to integrate view-invariant information into posterior inference to enhance the solidative of the learned latent representation. The intrinsic correlations between views are mined to seek cross-view generality, and information leading to missing views is fused by view weights to reach solidity. Benchmark experimental results in clustering, classification, and cross-view image generation tasks demonstrate the superiority of CMVAE, while time complexity and parameter sensitivity analyses illustrate the efficiency and robustness. Additionally, application to bioinformatics data exemplifies its practical significance.
Collapse
|
7
|
Cui X, Chen X, Li Z, Gao Z, Chen S, Jiang R. Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity. NATURE COMPUTATIONAL SCIENCE 2024; 4:346-359. [PMID: 38730185 DOI: 10.1038/s43588-024-00625-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 04/05/2024] [Indexed: 05/12/2024]
Abstract
Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models-especially variational autoencoders-have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE's capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively.
Collapse
Affiliation(s)
- Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Xiaoyang Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zhen Li
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China.
| |
Collapse
|
8
|
Shannon CP, Lee AH, Tebbutt SJ, Singh A. A Commentary on Multi-omics Data Integration in Systems Vaccinology. J Mol Biol 2024; 436:168522. [PMID: 38458605 DOI: 10.1016/j.jmb.2024.168522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 03/04/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Affiliation(s)
| | - Amy Hy Lee
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | - Scott J Tebbutt
- PROOF Centre of Excellence, Vancouver, Canada; Department of Medicine, The University of British Columbia, Vancouver, Canada; Centre for Heart Lung Innovation, Vancouver, Canada
| | - Amrit Singh
- Centre for Heart Lung Innovation, Vancouver, Canada; Department of Anesthesiology, Pharmacology and Therapeutics, The University of British Columbia, Vancouver, Canada.
| |
Collapse
|
9
|
Cao Y, Zhao X, Tang S, Jiang Q, Li S, Li S, Chen S. scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders. Nat Commun 2024; 15:2973. [PMID: 38582890 PMCID: PMC10998864 DOI: 10.1038/s41467-024-47418-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 03/28/2024] [Indexed: 04/08/2024] Open
Abstract
Recent advancements for simultaneously profiling multi-omics modalities within individual cells have enabled the interrogation of cellular heterogeneity and molecular hierarchy. However, technical limitations lead to highly noisy multi-modal data and substantial costs. Although computational methods have been proposed to translate single-cell data across modalities, broad applications of the methods still remain impeded by formidable challenges. Here, we propose scButterfly, a versatile single-cell cross-modality translation method based on dual-aligned variational autoencoders and data augmentation schemes. With comprehensive experiments on multiple datasets, we provide compelling evidence of scButterfly's superiority over baseline methods in preserving cellular heterogeneity while translating datasets of various contexts and in revealing cell type-specific biological insights. Besides, we demonstrate the extensive applications of scButterfly for integrative multi-omics analysis of single-modality data, data enhancement of poor-quality single-cell multi-omics, and automatic cell type annotation of scATAC-seq data. Moreover, scButterfly can be generalized to unpaired data training, perturbation-response analysis, and consecutive translation.
Collapse
Affiliation(s)
- Yichuan Cao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xiamiao Zhao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Qun Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
10
|
Tur S, Palii CG, Brand M. Cell fate decision in erythropoiesis: Insights from multiomics studies. Exp Hematol 2024; 131:104167. [PMID: 38262486 PMCID: PMC10939800 DOI: 10.1016/j.exphem.2024.104167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/10/2024] [Accepted: 01/13/2024] [Indexed: 01/25/2024]
Abstract
Every second, the body produces 2 million red blood cells through a process called erythropoiesis. Erythropoiesis is hierarchical in that it results from a series of cell fate decisions whereby hematopoietic stem cells progress toward the erythroid lineage. Single-cell transcriptomic and proteomic approaches have revolutionized the way we understand erythropoiesis, revealing it to be a gradual process that underlies a progressive restriction of fate potential driven by quantitative changes in lineage-specifying transcription factors. Despite these major advances, we still know very little about what cell fate decision entails at the molecular level. Novel approaches that simultaneously measure additional properties in single cells, including chromatin accessibility, transcription factor binding, and/or cell surface proteins are being developed at a fast pace, providing the means to exciting new advances in the near future. In this review, we briefly summarize the main findings obtained from single-cell studies of erythropoiesis, highlight outstanding questions, and suggest recent technological advances to address them.
Collapse
Affiliation(s)
- Steven Tur
- Department of Cell and Regenerative Biology, Wisconsin Blood Cancer Research Institute, Wisconsin Institutes for Medical Research, University of Wisconsin School of Medicine and Public Health, Carbone Cancer Center, Madison, WI; Cellular and Molecular Biology Graduate Program, University of Wisconsin School of Medicine and Public Health, Madison, WI
| | - Carmen G Palii
- Department of Cell and Regenerative Biology, Wisconsin Blood Cancer Research Institute, Wisconsin Institutes for Medical Research, University of Wisconsin School of Medicine and Public Health, Carbone Cancer Center, Madison, WI
| | - Marjorie Brand
- Department of Cell and Regenerative Biology, Wisconsin Blood Cancer Research Institute, Wisconsin Institutes for Medical Research, University of Wisconsin School of Medicine and Public Health, Carbone Cancer Center, Madison, WI.
| |
Collapse
|
11
|
Gao C, Welch JD. Integrating single-cell multimodal epigenomic data using 1D-convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.16.580655. [PMID: 38464242 PMCID: PMC10925154 DOI: 10.1101/2024.02.16.580655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Recent experimental developments enable single-cell multimodal epigenomic profiling, which measures multiple histone modifications and chromatin accessibility within the same cell. Such parallel measurements provide exciting new opportunities to investigate how epigenomic modalities vary together across cell types and states. A pivotal step in using this type of data is integrating the epigenomic modalities to learn a unified representation of each cell, but existing approaches are not designed to model the unique nature of this data type. Our key insight is to model single-cell multimodal epigenome data as a multi-channel sequential signal. Based on this insight, we developed ConvNet-VAEs, a novel framework that uses 1D-convolutional variational autoencoders (VAEs) for single-cell multimodal epigenomic data integration. We evaluated ConvNet-VAEs on nano-CT and scNTT-seq data generated from juvenile mouse brain and human bone marrow. We found that ConvNet-VAEs can perform dimension reduction and batch correction better than previous architectures while using significantly fewer parameters. Furthermore, the performance gap between convolutional and fully-connected architectures increases with the number of modalities, and deeper convolutional architectures can increase performance while performance degrades for deeper fully-connected architectures. Our results indicate that convolutional autoencoders are a promising method for integrating current and future single-cell multimodal epigenomic datasets.
Collapse
Affiliation(s)
- Chao Gao
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI 48109, USA
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI 48109, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor MI 48109, USA
| |
Collapse
|
12
|
Lin Y, Wu TY, Chen X, Wan S, Chao B, Xin J, Yang JYH, Wong WH, Wang YXR. Data integration and inference of gene regulation using single-cell temporal multimodal data with scTIE. Genome Res 2024; 34:119-133. [PMID: 38190633 PMCID: PMC10903952 DOI: 10.1101/gr.277960.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 12/13/2023] [Indexed: 01/10/2024]
Abstract
Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.
Collapse
Affiliation(s)
- Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR 999077, China
| | - Tung-Yu Wu
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Xi Chen
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Sheng Wan
- Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
| | - Brian Chao
- Department of Electrical Engineering, Stanford University, Stanford, California 94305-9505, USA
| | - Jingxue Xin
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR 999077, China
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, California 94305-4020, USA;
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305-5464, USA
- Bio-X Program, Stanford University, Stanford, California 94305, USA
| | - Y X Rachel Wang
- School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia;
| |
Collapse
|
13
|
He Z, Hu S, Chen Y, An S, Zhou J, Liu R, Shi J, Wang J, Dong G, Shi J, Zhao J, Ou-Yang L, Zhu Y, Bo X, Ying X. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat Biotechnol 2024:10.1038/s41587-023-02040-y. [PMID: 38263515 DOI: 10.1038/s41587-023-02040-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 10/23/2023] [Indexed: 01/25/2024]
Abstract
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Collapse
Affiliation(s)
- Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiahao Zhou
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Junfeng Shi
- School of Automation, China University of Geosciences, Wuhan, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jiaxin Zhao
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China.
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China.
| |
Collapse
|
14
|
Zahedi R, Ghamsari R, Argha A, Macphillamy C, Beheshti A, Alizadehsani R, Lovell NH, Lotfollahi M, Alinejad-Rokny H. Deep learning in spatially resolved transcriptfomics: a comprehensive technical view. Brief Bioinform 2024; 25:bbae082. [PMID: 38483255 PMCID: PMC10939360 DOI: 10.1093/bib/bbae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/22/2024] [Accepted: 02/13/2024] [Indexed: 03/17/2024] Open
Abstract
Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives.
Collapse
Affiliation(s)
- Roxana Zahedi
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Reza Ghamsari
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Callum Macphillamy
- School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, 5371, Australia
| | - Amin Beheshti
- School of Computing, Macquarie University, Sydney, 2109, Australia
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, Melbourne, VIC, 3216, Australia
| | - Nigel H Lovell
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Hamid Alinejad-Rokny
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| |
Collapse
|
15
|
Xiao C, Chen Y, Meng Q, Wei L, Zhang X. Benchmarking multi-omics integration algorithms across single-cell RNA and ATAC data. Brief Bioinform 2024; 25:bbae095. [PMID: 38493343 PMCID: PMC10944570 DOI: 10.1093/bib/bbae095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/30/2024] [Accepted: 02/16/2024] [Indexed: 03/18/2024] Open
Abstract
Recent advancements in single-cell sequencing technologies have generated extensive omics data in various modalities and revolutionized cell research, especially in the single-cell RNA and ATAC data. The joint analysis across scRNA-seq data and scATAC-seq data has paved the way to comprehending the cellular heterogeneity and complex cellular regulatory networks. Multi-omics integration is gaining attention as an important step in joint analysis, and the number of computational tools in this field is growing rapidly. In this paper, we benchmarked 12 multi-omics integration methods on three integration tasks via qualitative visualization and quantitative metrics, considering six main aspects that matter in multi-omics data analysis. Overall, we found that different methods have their own advantages on different aspects, while some methods outperformed other methods in most aspects. We therefore provided guidelines for selecting appropriate methods for specific scenarios and tasks to help obtain meaningful insights from multi-omics data integration.
Collapse
Affiliation(s)
- Chuxi Xiao
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yixin Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qiuchen Meng
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Lei Wei
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
- School of Life Sciences and School of Medicine, Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
16
|
Chen Y, Zheng R, Liu J, Li M. scMLC: an accurate and robust multiplex community detection method for single-cell multi-omics data. Brief Bioinform 2024; 25:bbae101. [PMID: 38493339 PMCID: PMC10944569 DOI: 10.1093/bib/bbae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 01/03/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Clustering cells based on single-cell multi-modal sequencing technologies provides an unprecedented opportunity to create high-resolution cell atlas, reveal cellular critical states and study health and diseases. However, effectively integrating different sequencing data for cell clustering remains a challenging task. Motivated by the successful application of Louvain in scRNA-seq data, we propose a single-cell multi-modal Louvain clustering framework, called scMLC, to tackle this problem. scMLC builds multiplex single- and cross-modal cell-to-cell networks to capture modal-specific and consistent information between modalities and then adopts a robust multiplex community detection method to obtain the reliable cell clusters. In comparison with 15 state-of-the-art clustering methods on seven real datasets simultaneously measuring gene expression and chromatin accessibility, scMLC achieves better accuracy and stability in most datasets. Synthetic results also indicate that the cell-network-based integration strategy of multi-omics data is superior to other strategies in terms of generalization. Moreover, scMLC is flexible and can be extended to single-cell sequencing data with more than two modalities.
Collapse
Affiliation(s)
- Yuxuan Chen
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jin Liu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
17
|
Year in review 2023. Nat Methods 2024; 21:1-2. [PMID: 38212549 DOI: 10.1038/s41592-023-02158-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
|
18
|
Zhu B, Wang Y, Ku LT, van Dijk D, Zhang L, Hafler DA, Zhao H. scNAT: a deep learning method for integrating paired single-cell RNA and T cell receptor sequencing profiles. Genome Biol 2023; 24:292. [PMID: 38111007 PMCID: PMC10726524 DOI: 10.1186/s13059-023-03129-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/27/2023] [Indexed: 12/20/2023] Open
Abstract
Many deep learning-based methods have been proposed to handle complex single-cell data. Deep learning approaches may also prove useful to jointly analyze single-cell RNA sequencing (scRNA-seq) and single-cell T cell receptor sequencing (scTCR-seq) data for novel discoveries. We developed scNAT, a deep learning method that integrates paired scRNA-seq and scTCR-seq data to represent data in a unified latent space for downstream analysis. We demonstrate that scNAT is capable of removing batch effects, and identifying cell clusters and a T cell migration trajectory from blood to cerebrospinal fluid in multiple sclerosis.
Collapse
Affiliation(s)
- Biqing Zhu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, USA, MD , 20815
| | - Yuge Wang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA
| | - Li-Ting Ku
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA
| | - David van Dijk
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT, 06511, USA
- Department of Computer Science, Yale University, New Haven, CT, 06511, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, USA, MD , 20815
| | - Le Zhang
- Department of Neuroscience, School of Medicine, Yale University, New Haven, CT, 06511, USA
- Department of Immunobiology, School of Medicine, Yale University, New Haven, CT, 06511, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, USA, MD , 20815
| | - David A Hafler
- Department of Neurology, School of Medicine, Yale University, New Haven, CT, 06511, USA
- Department of Immunobiology, School of Medicine, Yale University, New Haven, CT, 06511, USA
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, USA, MD , 20815
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA.
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
19
|
AbouHassan I, Kasabov NK, Jagtap V, Kulkarni P. Spiking neural networks for predictive and explainable modelling of multimodal streaming data with a case study on financial time series and online news. Sci Rep 2023; 13:18367. [PMID: 37884551 PMCID: PMC10603166 DOI: 10.1038/s41598-023-42605-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 09/12/2023] [Indexed: 10/28/2023] Open
Abstract
In a first study, this paper argues and demonstrates that spiking neural networks (SNN) can be successfully used for predictive and explainable modelling of multimodal streaming data. The paper proposes a new method, where both time series and on-line news are integrated as numerical streaming data in the same time domain and then used to train incrementally a SNN model. The connectivity and the spiking activity of the SNN are then analyzed through clustering and dynamic graph extraction to reveal on-line interaction between all input variables in regard to the predicted one. The paper answers the main research question of how to understand the dynamic interaction of time series and on-line news through their integrative modelling. It offers a new method to evaluate the efficiency of using on-line news on the predictive modelling of time series. Results on financial stock time series and online news are presented. In contrast to traditional machine learning techniques, the method reveals the dynamic interaction between stock variables and news and their dynamic impact on model accuracy when compared to models that do not use news information. Along with the used financial data, the method is applicable to a wide range of other multimodal time series and news data, such as economic, medical, environmental and social. The proposed method, being based on SNN, promotes the use of massively parallel and low energy neuromorphic hardware for multivariate on-line data modelling.
Collapse
Affiliation(s)
- Iman AbouHassan
- Technical University of Sofia, Sofia, Bulgaria.
- Central Bank of Lebanon, Beirut, Lebanon.
| | - Nikola K Kasabov
- KEDRI, SECMS, Auckland University of Technology, Auckland, New Zealand.
- Ulster University, Belfast, UK.
- IICT, Bulgarian Academy of Sciences, Sofia, Bulgaria.
| | | | - Parag Kulkarni
- College of Engineering, Pune, India
- Tokyo International University, Tokyo, Japan
| |
Collapse
|
20
|
Lee MYY, Kaestner KH, Li M. Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data. Genome Biol 2023; 24:244. [PMID: 37875977 PMCID: PMC10594700 DOI: 10.1186/s13059-023-03073-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 09/25/2023] [Indexed: 10/26/2023] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) measures gene expression in single cells, while single-nucleus ATAC-sequencing (snATAC-seq) quantifies chromatin accessibility in single nuclei. These two data types provide complementary information for deciphering cell types and states. However, when analyzed individually, they sometimes produce conflicting results regarding cell type/state assignment. The power is compromised since the two modalities reflect the same underlying biology. Recently, it has become possible to measure both gene expression and chromatin accessibility from the same nucleus. Such paired data enable the direct modeling of the relationships between the two modalities. Given the availability of the vast amount of single-modality data, it is desirable to integrate the paired and unpaired single-modality datasets to gain a comprehensive view of the cellular complexity. RESULTS We benchmark nine existing single-cell multi-omic data integration methods. Specifically, we evaluate to what extent the multiome data provide additional guidance for analyzing the existing single-modality data, and whether these methods uncover peak-gene associations from single-modality data. Our results indicate that multiome data are helpful for annotating single-modality data. However, we emphasize that the availability of an adequate number of nuclei in the multiome dataset is crucial for achieving accurate cell type annotation. Insufficient representation of nuclei may compromise the reliability of the annotations. Additionally, when generating a multiome dataset, the number of cells is more important than sequencing depth for cell type annotation. CONCLUSIONS Seurat v4 is the best currently available platform for integrating scRNA-seq, snATAC-seq, and multiome data even in the presence of complex batch effects.
Collapse
Affiliation(s)
- Michelle Y Y Lee
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Philadelphia, PA, 19104, USA
| | - Klaus H Kaestner
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
21
|
Shi Q, Chen X, Zhang Z. Decoding Human Biology and Disease Using Single-cell Omics Technologies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:926-949. [PMID: 37739168 PMCID: PMC10928380 DOI: 10.1016/j.gpb.2023.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 05/22/2023] [Accepted: 06/08/2023] [Indexed: 09/24/2023]
Abstract
Over the past decade, advances in single-cell omics (SCO) technologies have enabled the investigation of cellular heterogeneity at an unprecedented resolution and scale, opening a new avenue for understanding human biology and disease. In this review, we summarize the developments of sequencing-based SCO technologies and computational methods, and focus on considerable insights acquired from SCO sequencing studies to understand normal and diseased properties, with a particular emphasis on cancer research. We also discuss the technological improvements of SCO and its possible contribution to fundamental research of the human, as well as its great potential in clinical diagnoses and personalized therapies of human disease.
Collapse
Affiliation(s)
- Qiang Shi
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
| | - Xueyan Chen
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China
| | - Zemin Zhang
- Biomedical Pioneering Innovation Center, School of Life Sciences, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China; Changping Laboratory, Beijing 102206, China.
| |
Collapse
|
22
|
Cheng C, Chen W, Jin H, Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication. Cells 2023; 12:1970. [PMID: 37566049 PMCID: PMC10417635 DOI: 10.3390/cells12151970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular biology at an unprecedented resolution, enabling the characterization of cellular heterogeneity, identification of rare but significant cell types, and exploration of cell-cell communications and interactions. Its broad applications span both basic and clinical research domains. In this comprehensive review, we survey the current landscape of scRNA-seq analysis methods and tools, focusing on count modeling, cell-type annotation, data integration, including spatial transcriptomics, and the inference of cell-cell communication. We review the challenges encountered in scRNA-seq analysis, including issues of sparsity or low expression, reliability of cell annotation, and assumptions in data integration, and discuss the potential impact of suboptimal clustering and differential expression analysis tools on downstream analyses, particularly in identifying cell subpopulations. Finally, we discuss recent advancements and future directions for enhancing scRNA-seq analysis. Specifically, we highlight the development of novel tools for annotating single-cell data, integrating and interpreting multimodal datasets covering transcriptomics, epigenomics, and proteomics, and inferring cellular communication networks. By elucidating the latest progress and innovation, we provide a comprehensive overview of the rapidly advancing field of scRNA-seq analysis.
Collapse
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| | - Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Hongjian Jin
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Xiang Chen
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| |
Collapse
|
23
|
Yu L, Liu C, Yang JYH, Yang P. Ensemble deep learning of embeddings for clustering multimodal single-cell omics data. Bioinformatics 2023; 39:btad382. [PMID: 37314966 PMCID: PMC10287920 DOI: 10.1093/bioinformatics/btad382] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/16/2023] [Accepted: 06/12/2023] [Indexed: 06/16/2023] Open
Abstract
MOTIVATION Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy. RESULTS We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data. AVAILABILITY AND IMPLEMENTATION SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').
Collapse
Affiliation(s)
- Lijia Yu
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| | - Pengyi Yang
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| |
Collapse
|
24
|
Gabitto MI, Travaglini KJ, Rachleff VM, Kaplan ES, Long B, Ariza J, Ding Y, Mahoney JT, Dee N, Goldy J, Melief EJ, Brouner K, Campos J, Carr AJ, Casper T, Chakrabarty R, Clark M, Compos J, Cool J, Valera Cuevas NJ, Dalley R, Darvas M, Ding SL, Dolbeare T, Mac Donald CL, Egdorf T, Esposito L, Ferrer R, Gala R, Gary A, Gloe J, Guilford N, Guzman J, Ho W, Jarksy T, Johansen N, Kalmbach BE, Keene LM, Khawand S, Kilgore M, Kirkland A, Kunst M, Lee BR, Malone J, Maltzer Z, Martin N, McCue R, McMillen D, Meyerdierks E, Meyers KP, Mollenkopf T, Montine M, Nolan AL, Nyhus J, Olsen PA, Pacleb M, Pham T, Pom CA, Postupna N, Ruiz A, Schantz AM, Sorensen SA, Staats B, Sullivan M, Sunkin SM, Thompson C, Tieu M, Ting J, Torkelson A, Tran T, Wang MQ, Waters J, Wilson AM, Haynor D, Gatto N, Jayadev S, Mufti S, Ng L, Mukherjee S, Crane PK, Latimer CS, Levi BP, Smith K, Close JL, Miller JA, Hodge RD, Larson EB, Grabowski TJ, Hawrylycz M, Keene CD, Lein ES. Integrated multimodal cell atlas of Alzheimer's disease. RESEARCH SQUARE 2023:rs.3.rs-2921860. [PMID: 37292694 PMCID: PMC10246227 DOI: 10.21203/rs.3.rs-2921860/v1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Alzheimer's disease (AD) is the most common cause of dementia in older adults. Neuropathological and imaging studies have demonstrated a progressive and stereotyped accumulation of protein aggregates, but the underlying molecular and cellular mechanisms driving AD progression and vulnerable cell populations affected by disease remain coarsely understood. The current study harnesses single cell and spatial genomics tools and knowledge from the BRAIN Initiative Cell Census Network to understand the impact of disease progression on middle temporal gyrus cell types. We used image-based quantitative neuropathology to place 84 donors spanning the spectrum of AD pathology along a continuous disease pseudoprogression score and multiomic technologies to profile single nuclei from each donor, mapping their transcriptomes, epigenomes, and spatial coordinates to a common cell type reference with unprecedented resolution. Temporal analysis of cell-type proportions indicated an early reduction of Somatostatin-expressing neuronal subtypes and a late decrease of supragranular intratelencephalic-projecting excitatory and Parvalbumin-expressing neurons, with increases in disease-associated microglial and astrocytic states. We found complex gene expression differences, ranging from global to cell type-specific effects. These effects showed different temporal patterns indicating diverse cellular perturbations as a function of disease progression. A subset of donors showed a particularly severe cellular and molecular phenotype, which correlated with steeper cognitive decline. We have created a freely available public resource to explore these data and to accelerate progress in AD research at SEA-AD.org.
Collapse
Affiliation(s)
| | | | - Victoria M. Rachleff
- Allen Institute for Brain Science, Seattle, WA, 98109
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | | | - Brian Long
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Jeanelle Ariza
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Yi Ding
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | - Nick Dee
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Jeff Goldy
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Erica J. Melief
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | | | - John Campos
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | | | - Tamara Casper
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | - Michael Clark
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Jazmin Compos
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Jonah Cool
- Chan Zuckerberg Initiative, Redwood City, CA 94063
| | | | - Rachel Dalley
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Martin Darvas
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Song-Lin Ding
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Tim Dolbeare
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | - Tom Egdorf
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Luke Esposito
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | - Rohan Gala
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Amanda Gary
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Jessica Gloe
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | | | - Windy Ho
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Tim Jarksy
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | | | - Lisa M. Keene
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Sarah Khawand
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Mitch Kilgore
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Amanda Kirkland
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Michael Kunst
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Brian R. Lee
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | - Zoe Maltzer
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Naomi Martin
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Rachel McCue
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | | | - Kelly P. Meyers
- Kaiser Permanente Washington Research Institute, Seattle, WA, 98101
| | | | - Mark Montine
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Amber L. Nolan
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Julie Nyhus
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Paul A. Olsen
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Maiya Pacleb
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Thanh Pham
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | - Nadia Postupna
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Augustin Ruiz
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Aimee M. Schantz
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | | | - Brian Staats
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Matt Sullivan
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | | | - Michael Tieu
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Jonathan Ting
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Amy Torkelson
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Tracy Tran
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | - Jack Waters
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Angela M. Wilson
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - David Haynor
- Department of Radiology, University of Washington, Seattle, WA 98014
| | - Nicole Gatto
- Kaiser Permanente Washington Research Institute, Seattle, WA, 98101
| | - Suman Jayadev
- Department of Neurology, University of Washington, Seattle, WA 98104
| | - Shoaib Mufti
- Allen Institute for Brain Science, Seattle, WA, 98109
| | - Lydia Ng
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | - Paul K. Crane
- Department of Medicine, University of Washington, Seattle, WA 98104
| | - Caitlin S. Latimer
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Boaz P. Levi
- Allen Institute for Brain Science, Seattle, WA, 98109
| | | | | | | | | | - Eric B. Larson
- Department of Medicine, University of Washington, Seattle, WA 98104
| | | | | | - C. Dirk Keene
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104
| | - Ed S. Lein
- Allen Institute for Brain Science, Seattle, WA, 98109
| |
Collapse
|