1
|
Wu Y, Liu J, Xiao Y, Zhang S, Li L. CoupleVAE: coupled variational autoencoders for predicting perturbational single-cell RNA sequencing data. Brief Bioinform 2025; 26:bbaf126. [PMID: 40178283 PMCID: PMC11966612 DOI: 10.1093/bib/bbaf126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 01/21/2025] [Accepted: 03/03/2025] [Indexed: 04/05/2025] Open
Abstract
With the rapid advances in single-cell sequencing technology, it is now feasible to conduct in-depth genetic analysis in individual cells. Study on the dynamics of single cells in response to perturbations is of great significance for understanding the functions and behaviors of living organisms. However, the acquisition of post-perturbation cellular states via biological experiments is frequently cost-prohibitive. Predicting the single-cell perturbation responses poses a critical challenge in the field of computational biology. In this work, we propose a novel deep learning method called coupled variational autoencoders (CoupleVAE), devised to predict the postperturbation single-cell RNA-Seq data. CoupleVAE is composed of two coupled VAEs connected by a coupler, initially extracting latent features for controlled and perturbed cells via two encoders, subsequently engaging in mutual translation within the latent space through two nonlinear mappings via a coupler, and ultimately generating controlled and perturbed data by two separate decoders to process the encoded and translated features. CoupleVAE facilitates a more intricate state transformation of single cells within the latent space. Experiments in three real datasets on infection, stimulation and cross-species prediction show that CoupleVAE surpasses the existing comparative models in effectively predicting single-cell RNA-seq data for perturbed cells, achieving superior accuracy.
Collapse
Affiliation(s)
- Yahao Wu
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an, Shaanxi 710049, China
| | - Jing Liu
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an, Shaanxi 710049, China
| | - Yanni Xiao
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an, Shaanxi 710049, China
| | - Shuqin Zhang
- School of Mathematical Sciences, Center for Applied Mathematics, Research Institute of Intelligent Complex Systems, and Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, 220 Handan Road, 200433 Shanghai, China
| | - Limin Li
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an, Shaanxi 710049, China
| |
Collapse
|
2
|
Gavriilidis GI, Vasileiou V, Orfanou A, Ishaque N, Psomopoulos F. A mini-review on perturbation modelling across single-cell omic modalities. Comput Struct Biotechnol J 2024; 23:1886-1896. [PMID: 38721585 PMCID: PMC11076269 DOI: 10.1016/j.csbj.2024.04.058] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/23/2024] [Accepted: 04/23/2024] [Indexed: 01/06/2025] Open
Abstract
Recent advances in single-cell omics technology have transformed the landscape of cellular and molecular research, enriching the scope and intricacy of cellular characterisation. Perturbation modelling seeks to comprehensively grasp the effects of external influences like disease onset or molecular knock-outs or external stimulants on cellular physiology, specifically on transcription factors, signal transducers, biological pathways, and dynamic cell states. Machine and deep learning tools transform complex perturbational phenomena in algorithmically tractable tasks to formulate predictions based on various types of single-cell datasets. However, the recent surge in tools and datasets makes it challenging for experimental biologists and computational scientists to keep track of the recent advances in this rapidly expanding filed of single-cell modelling. Here, we recapitulate the main objectives of perturbation modelling and summarise novel single-cell perturbation technologies based on genetic manipulation like CRISPR or compounds, spanning across omic modalities. We then concisely review a burgeoning group of computational methods extending from classical statistical inference methodologies to various machine and deep learning architectures like shallow models or autoencoders, to biologically informed approaches based on gene regulatory networks, and to combinatorial efforts reminiscent of ensemble learning. We also discuss the rising trend of large foundational models in single-cell perturbation modelling inspired by large language models. Lastly, we critically assess the challenges that underline single-cell perturbation modelling while pointing towards relevant future perspectives like perturbation atlases, multi-omics and spatial datasets, causal machine learning for interpretability, multi-task learning for performance and explainability as well as prospects for solving interoperability and benchmarking pitfalls.
Collapse
Affiliation(s)
- George I. Gavriilidis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Vasileios Vasileiou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
- Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, Greece
| | - Aspasia Orfanou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Naveed Ishaque
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Center of Digital Health, Berlin, Germany
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
3
|
Peidli S, Green TD, Shen C, Gross T, Min J, Garda S, Yuan B, Schumacher LJ, Taylor-King JP, Marks DS, Luna A, Blüthgen N, Sander C. scPerturb: harmonized single-cell perturbation data. Nat Methods 2024; 21:531-540. [PMID: 38279009 DOI: 10.1038/s41592-023-02144-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 12/04/2023] [Indexed: 01/28/2024]
Abstract
Analysis across a growing number of single-cell perturbation datasets is hampered by poor data interoperability. To facilitate development and benchmarking of computational methods, we collect a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We apply uniform quality control pipelines and harmonize feature annotations. The resulting information resource, scPerturb, enables development and testing of computational methods, and facilitates comparison and integration across datasets. We describe energy statistics (E-statistics) for quantification of perturbation effects and significance testing, and demonstrate E-distance as a general distance measure between sets of single-cell expression profiles. We illustrate the application of E-statistics for quantifying similarity and efficacy of perturbations. The perturbation-response datasets and E-statistics computation software are publicly available at scperturb.org. This work provides an information resource for researchers working with single-cell perturbation data and recommendations for experimental design, including optimal cell counts and read depth.
Collapse
Affiliation(s)
- Stefan Peidli
- Institute of Pathology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität, Berlin, Germany.
- Institute of Biology, Humboldt-Universität, Berlin, Germany.
| | - Tessa D Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ciyue Shen
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | | | - Joseph Min
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Samuele Garda
- Institute of Biology, Humboldt-Universität, Berlin, Germany
- Institute for Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Bo Yuan
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Linus J Schumacher
- Centre for Regenerative Medicine, University of Edinburgh, Edinburgh, UK
| | | | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Augustin Luna
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
- Computational Biology Branch, National Library of Medicine and Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD, USA.
| | - Nils Blüthgen
- Institute of Pathology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität, Berlin, Germany.
- Institute of Biology, Humboldt-Universität, Berlin, Germany.
| | - Chris Sander
- Departments of Cell Biology and Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute, Cambridge, MA, USA.
| |
Collapse
|