1
|
Wu Y, Liu J, Xiao Y, Zhang S, Li L. CoupleVAE: coupled variational autoencoders for predicting perturbational single-cell RNA sequencing data. Brief Bioinform 2025; 26:bbaf126. [PMID: 40178283 PMCID: PMC11966612 DOI: 10.1093/bib/bbaf126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 01/21/2025] [Accepted: 03/03/2025] [Indexed: 04/05/2025] Open
Abstract
With the rapid advances in single-cell sequencing technology, it is now feasible to conduct in-depth genetic analysis in individual cells. Study on the dynamics of single cells in response to perturbations is of great significance for understanding the functions and behaviors of living organisms. However, the acquisition of post-perturbation cellular states via biological experiments is frequently cost-prohibitive. Predicting the single-cell perturbation responses poses a critical challenge in the field of computational biology. In this work, we propose a novel deep learning method called coupled variational autoencoders (CoupleVAE), devised to predict the postperturbation single-cell RNA-Seq data. CoupleVAE is composed of two coupled VAEs connected by a coupler, initially extracting latent features for controlled and perturbed cells via two encoders, subsequently engaging in mutual translation within the latent space through two nonlinear mappings via a coupler, and ultimately generating controlled and perturbed data by two separate decoders to process the encoded and translated features. CoupleVAE facilitates a more intricate state transformation of single cells within the latent space. Experiments in three real datasets on infection, stimulation and cross-species prediction show that CoupleVAE surpasses the existing comparative models in effectively predicting single-cell RNA-seq data for perturbed cells, achieving superior accuracy.
Collapse
Affiliation(s)
- Yahao Wu
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an, Shaanxi 710049, China
| | - Jing Liu
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an, Shaanxi 710049, China
| | - Yanni Xiao
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an, Shaanxi 710049, China
| | - Shuqin Zhang
- School of Mathematical Sciences, Center for Applied Mathematics, Research Institute of Intelligent Complex Systems, and Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, 220 Handan Road, 200433 Shanghai, China
| | - Limin Li
- School of Mathematics and Statistics, Xi’an Jiaotong University, No. 28 Xianning West Road, Xi’an, Shaanxi 710049, China
| |
Collapse
|
2
|
Dimitrieva S, Janssens R, Li G, Szalata A, Gopalakrishnan R, Parmar C, Kauffmann A, Durand EY. Biologically relevant integration of transcriptomics profiles from cancer cell lines, patient-derived xenografts, and clinical tumors using deep learning. SCIENCE ADVANCES 2025; 11:eadn5596. [PMID: 39823329 PMCID: PMC11740957 DOI: 10.1126/sciadv.adn5596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 12/16/2024] [Indexed: 01/19/2025]
Abstract
Cell lines and patient-derived xenografts are essential to cancer research; however, the results derived from such models often lack clinical translatability, as they do not fully recapitulate the complex cancer biology. Identifying preclinical models that sufficiently resemble the biological characteristics of clinical tumors across different cancers is critically important. Here, we developed MOBER, Multi-Origin Batch Effect Remover method, to simultaneously extract biologically meaningful embeddings while removing confounder information. Applying MOBER on 932 cancer cell lines, 434 patient-derived tumor xenografts, and 11,159 clinical tumors, we identified preclinical models with greatest transcriptional fidelity to clinical tumors and models that are transcriptionally unrepresentative of their respective clinical tumors. MOBER allows for transformation of transcriptional profiles of preclinical models to resemble the ones of clinical tumors and, therefore, can be used to improve the clinical translation of insights gained from preclinical models. MOBER is a versatile batch effect removal method applicable to diverse transcriptomic datasets, enabling integration of multiple datasets simultaneously.
Collapse
Affiliation(s)
- Slavica Dimitrieva
- Disease Area Oncology, Novartis Institutes for Biomedical Research, CH-4002 Basel, Switzerland
| | - Rens Janssens
- Disease Area Oncology, Novartis Institutes for Biomedical Research, CH-4002 Basel, Switzerland
| | - Gang Li
- Disease Area Oncology, Novartis Institutes for Biomedical Research, CH-4002 Basel, Switzerland
| | - Artur Szalata
- Disease Area Oncology, Novartis Institutes for Biomedical Research, CH-4002 Basel, Switzerland
| | | | - Chintan Parmar
- Disease Area Oncology, Novartis Institutes for Biomedical Research, Cambridge, MA, USA
| | - Audrey Kauffmann
- Disease Area Oncology, Novartis Institutes for Biomedical Research, CH-4002 Basel, Switzerland
| | - Eric Y. Durand
- Disease Area Oncology, Novartis Institutes for Biomedical Research, CH-4002 Basel, Switzerland
| |
Collapse
|
3
|
An L, Zhang C, Wulan N, Zhang S, Chen P, Ji F, Ng KK, Chen C, Zhou JH, Yeo BTT. DeepResBat: Deep residual batch harmonization accounting for covariate distribution differences. Med Image Anal 2025; 99:103354. [PMID: 39368279 DOI: 10.1016/j.media.2024.103354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 09/17/2024] [Accepted: 09/18/2024] [Indexed: 10/07/2024]
Abstract
Pooling MRI data from multiple datasets requires harmonization to reduce undesired inter-site variabilities, while preserving effects of biological variables (or covariates). The popular harmonization approach ComBat uses a mixed effect regression framework that explicitly accounts for covariate distribution differences across datasets. There is also significant interest in developing harmonization approaches based on deep neural networks (DNNs), such as conditional variational autoencoder (cVAE). However, current DNN approaches do not explicitly account for covariate distribution differences across datasets. Here, we provide mathematical results, suggesting that not accounting for covariates can lead to suboptimal harmonization. We propose two DNN-based covariate-aware harmonization approaches: covariate VAE (coVAE) and DeepResBat. The coVAE approach is a natural extension of cVAE by concatenating covariates and site information with site- and covariate-invariant latent representations. DeepResBat adopts a residual framework inspired by ComBat. DeepResBat first removes the effects of covariates with nonlinear regression trees, followed by eliminating site differences with cVAE. Finally, covariate effects are added back to the harmonized residuals. Using three datasets from three continents with a total of 2787 participants and 10,085 anatomical T1 scans, we find that DeepResBat and coVAE outperformed ComBat, CovBat and cVAE in terms of removing dataset differences, while enhancing biological effects of interest. However, coVAE hallucinates spurious associations between anatomical MRI and covariates even when no association exists. Future studies proposing DNN-based harmonization approaches should be aware of this false positive pitfall. Overall, our results suggest that DeepResBat is an effective deep learning alternative to ComBat. Code for DeepResBat can be found here: https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/harmonization/An2024_DeepResBat.
Collapse
Affiliation(s)
- Lijun An
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; N.1 Institute for Health, National University of Singapore, Singapore
| | - Chen Zhang
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; N.1 Institute for Health, National University of Singapore, Singapore
| | - Naren Wulan
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; N.1 Institute for Health, National University of Singapore, Singapore
| | - Shaoshi Zhang
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; N.1 Institute for Health, National University of Singapore, Singapore
| | - Pansheng Chen
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; N.1 Institute for Health, National University of Singapore, Singapore
| | - Fang Ji
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Kwun Kei Ng
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Christopher Chen
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Juan Helen Zhou
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore
| | - B T Thomas Yeo
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; N.1 Institute for Health, National University of Singapore, Singapore; Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore; Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA.
| |
Collapse
|
4
|
Yuan H, Mancuso CA, Johnson K, Braasch I, Krishnan A. Computational strategies for cross-species knowledge transfer and translational biomedicine. ARXIV 2024:arXiv:2408.08503v1. [PMID: 39184546 PMCID: PMC11343225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Research organisms provide invaluable insights into human biology and diseases, serving as essential tools for functional experiments, disease modeling, and drug testing. However, evolutionary divergence between humans and research organisms hinders effective knowledge transfer across species. Here, we review state-of-the-art methods for computationally transferring knowledge across species, primarily focusing on methods that utilize transcriptome data and/or molecular networks. We introduce the term "agnology" to describe the functional equivalence of molecular components regardless of evolutionary origin, as this concept is becoming pervasive in integrative data-driven models where the role of evolutionary origin can become unclear. Our review addresses four key areas of information and knowledge transfer across species: (1) transferring disease and gene annotation knowledge, (2) identifying agnologous molecular components, (3) inferring equivalent perturbed genes or gene sets, and (4) identifying agnologous cell types. We conclude with an outlook on future directions and several key challenges that remain in cross-species knowledge transfer.
Collapse
Affiliation(s)
- Hao Yuan
- Genetics and Genome Science Program; Ecology, Evolution, and Behavior Program, Michigan State University
| | - Christopher A. Mancuso
- Department of Biostatistics & Informatics, University of Colorado Anschutz Medical Campus
| | - Kayla Johnson
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
| | - Ingo Braasch
- Department of Integrative Biology; Genetics and Genome Science Program; Ecology, Evolution, and Behavior Program, Michigan State University
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
| |
Collapse
|
5
|
An L, Zhang C, Wulan N, Zhang S, Chen P, Ji F, Ng KK, Chen C, Zhou JH, Yeo BTT. DeepResBat: deep residual batch harmonization accounting for covariate distribution differences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.574145. [PMID: 38293022 PMCID: PMC10827218 DOI: 10.1101/2024.01.18.574145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Pooling MRI data from multiple datasets requires harmonization to reduce undesired inter-site variabilities, while preserving effects of biological variables (or covariates). The popular harmonization approach ComBat uses a mixed effect regression framework that explicitly accounts for covariate distribution differences across datasets. There is also significant interest in developing harmonization approaches based on deep neural networks (DNNs), such as conditional variational autoencoder (cVAE). However, current DNN approaches do not explicitly account for covariate distribution differences across datasets. Here, we provide mathematical results, suggesting that not accounting for covariates can lead to suboptimal harmonization. We propose two DNN-based covariate-aware harmonization approaches: covariate VAE (coVAE) and DeepResBat. The coVAE approach is a natural extension of cVAE by concatenating covariates and site information with site- and covariate-invariant latent representations. DeepResBat adopts a residual framework inspired by ComBat. DeepResBat first removes the effects of covariates with nonlinear regression trees, followed by eliminating site differences with cVAE. Finally, covariate effects are added back to the harmonized residuals. Using three datasets from three continents with a total of 2787 participants and 10085 anatomical T1 scans, we find that DeepResBat and coVAE outperformed ComBat, CovBat and cVAE in terms of removing dataset differences, while enhancing biological effects of interest. However, coVAE hallucinates spurious associations between anatomical MRI and covariates even when no association exists. Future studies proposing DNN-based harmonization approaches should be aware of this false positive pitfall. Overall, our results suggest that DeepResBat is an effective deep learning alternative to ComBat. Code for DeepResBat can be found here: https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/harmonization/An2024_DeepResBat.
Collapse
Affiliation(s)
- Lijun An
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- N.1 Institute for Health, National University of Singapore, Singapore
| | - Chen Zhang
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- N.1 Institute for Health, National University of Singapore, Singapore
| | - Naren Wulan
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- N.1 Institute for Health, National University of Singapore, Singapore
| | - Shaoshi Zhang
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- N.1 Institute for Health, National University of Singapore, Singapore
| | - Pansheng Chen
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- N.1 Institute for Health, National University of Singapore, Singapore
| | - Fang Ji
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Kwun Kei Ng
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Christopher Chen
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Juan Helen Zhou
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore
| | - B T Thomas Yeo
- Centre for Sleep and Cognition & Centre for Translational MR Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
- Department of Medicine, Healthy Longevity Translational Research Programme, Human Potential Translational Research Programme & Institute for Digital Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- N.1 Institute for Health, National University of Singapore, Singapore
- Integrative Sciences and Engineering Programme (ISEP), National University of Singapore, Singapore
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA
| |
Collapse
|
6
|
Bai D, Ellington CN, Mo S, Song L, Xing EP. AttentionPert: accurately modeling multiplexed genetic perturbations with multi-scale effects. Bioinformatics 2024; 40:i453-i461. [PMID: 38940174 PMCID: PMC11211811 DOI: 10.1093/bioinformatics/btae244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Genetic perturbations (e.g. knockouts, variants) have laid the foundation for our understanding of many diseases, implicating pathogenic mechanisms and indicating therapeutic targets. However, experimental assays are fundamentally limited by the number of measurable perturbations. Computational methods can fill this gap by predicting perturbation effects under novel conditions, but accurately predicting the transcriptional responses of cells to unseen perturbations remains a significant challenge. RESULTS We address this by developing a novel attention-based neural network, AttentionPert, which accurately predicts gene expression under multiplexed perturbations and generalizes to unseen conditions. AttentionPert integrates global and local effects in a multi-scale model, representing both the nonuniform system-wide impact of the genetic perturbation and the localized disturbance in a network of gene-gene similarities, enhancing its ability to predict nuanced transcriptional responses to both single and multi-gene perturbations. In comprehensive experiments, AttentionPert demonstrates superior performance across multiple datasets outperforming the state-of-the-art method in predicting differential gene expressions and revealing novel gene regulations. AttentionPert marks a significant improvement over current methods, particularly in handling the diversity of gene perturbations and in predicting out-of-distribution scenarios. AVAILABILITY AND IMPLEMENTATION Code is available at https://github.com/BaiDing1234/AttentionPert.
Collapse
Affiliation(s)
- Ding Bai
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 00000, United Arabic Emirates
| | - Caleb N Ellington
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, United States
| | - Shentong Mo
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 00000, United Arabic Emirates
| | - Le Song
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 00000, United Arabic Emirates
| | - Eric P Xing
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 00000, United Arabic Emirates
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, 15213, United States
| |
Collapse
|
7
|
Kim J, Seok J. ctGAN: combined transformation of gene expression and survival data with generative adversarial network. Brief Bioinform 2024; 25:bbae325. [PMID: 38980369 PMCID: PMC11232285 DOI: 10.1093/bib/bbae325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/29/2024] [Accepted: 06/21/2024] [Indexed: 07/10/2024] Open
Abstract
Recent studies have extensively used deep learning algorithms to analyze gene expression to predict disease diagnosis, treatment effectiveness, and survival outcomes. Survival analysis studies on diseases with high mortality rates, such as cancer, are indispensable. However, deep learning models are plagued by overfitting owing to the limited sample size relative to the large number of genes. Consequently, the latest style-transfer deep generative models have been implemented to generate gene expression data. However, these models are limited in their applicability for clinical purposes because they generate only transcriptomic data. Therefore, this study proposes ctGAN, which enables the combined transformation of gene expression and survival data using a generative adversarial network (GAN). ctGAN improves survival analysis by augmenting data through style transformations between breast cancer and 11 other cancer types. We evaluated the concordance index (C-index) enhancements compared with previous models to demonstrate its superiority. Performance improvements were observed in nine of the 11 cancer types. Moreover, ctGAN outperformed previous models in seven out of the 11 cancer types, with colon adenocarcinoma (COAD) exhibiting the most significant improvement (median C-index increase of ~15.70%). Furthermore, integrating the generated COAD enhanced the log-rank p-value (0.041) compared with using only the real COAD (p-value = 0.797). Based on the data distribution, we demonstrated that the model generated highly plausible data. In clustering evaluation, ctGAN exhibited the highest performance in most cases (89.62%). These findings suggest that ctGAN can be meaningfully utilized to predict disease progression and select personalized treatments in the medical field.
Collapse
Affiliation(s)
- Jaeyoon Kim
- School of Electrical and Computer Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea
| | - Junhee Seok
- School of Electrical and Computer Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea
| |
Collapse
|
8
|
Yeh CH, Chen ZG, Liou CY, Chen MJ. Homogeneous Space Construction and Projection for Single-Cell Expression Prediction Based on Deep Learning. Bioengineering (Basel) 2023; 10:996. [PMID: 37760098 PMCID: PMC10525719 DOI: 10.3390/bioengineering10090996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 08/16/2023] [Accepted: 08/18/2023] [Indexed: 09/29/2023] Open
Abstract
Predicting cellular responses to perturbations is an unsolved problem in biology. Traditional approaches assume that different cell types respond similarly to perturbations. However, this assumption does not take into account the context of genome interactions in different cell types, which leads to compromised prediction quality. More recently, deep learning models used to discover gene-gene relationships can yield more accurate predictions of cellular responses. The huge difference in biological information between different cell types makes it difficult for deep learning models to encode data into a continuous low-dimensional feature space, which means that the features captured by the latent space may not be continuous. Therefore, the mapping relationship between the two conditional spaces learned by the model can only be applied where the real reference data resides, leading to the wrong mapping of the predicted target cells because they are not in the same domain as the reference data. In this paper, we propose an information-navigated variational autoencoder (INVAE), a deep neural network for cell perturbation response prediction. INVAE filters out information that is not conducive to predictive performance. For the remaining information, INVAE constructs a homogeneous space of control conditions, and finds the mapping relationship between the control condition space and the perturbation condition space. By embedding the target unit into the control space and then mapping it to the perturbation space, we can predict the perturbed state of the target unit. Comparing our proposed method with other three state-of-the-art methods on three real datasets, experimental results show that INVAE outperforms existing methods in cell state prediction after perturbation. Furthermore, we demonstrate that filtering out useless information not only improves prediction accuracy but also reveals similarities in how genes in different cell types are regulated following perturbation.
Collapse
Affiliation(s)
- Chia-Hung Yeh
- Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan; (Z.-G.C.); (C.-Y.L.)
- Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - Ze-Guang Chen
- Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan; (Z.-G.C.); (C.-Y.L.)
| | - Cheng-Yue Liou
- Department of Electrical Engineering, National Taiwan Normal University, Taipei 10610, Taiwan; (Z.-G.C.); (C.-Y.L.)
| | - Mei-Juan Chen
- Department of Electrical Engineering, National Dong Hwa University, Hualien 97401, Taiwan
| |
Collapse
|
9
|
Chicco D, Cumbo F, Angione C. Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLoS Comput Biol 2023; 19:e1011224. [PMID: 37410704 DOI: 10.1371/journal.pcbi.1011224] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open
Abstract
Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Claudio Angione
- School of Computing Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
10
|
Lotfollahi M, Klimovskaia Susmelj A, De Donno C, Hetzel L, Ji Y, Ibarra IL, Srivatsan SR, Naghipourfar M, Daza RM, Martin B, Shendure J, McFaline-Figueroa JL, Boyeau P, Wolf FA, Yakubova N, Günnemann S, Trapnell C, Lopez-Paz D, Theis FJ. Predicting cellular responses to complex perturbations in high-throughput screens. Mol Syst Biol 2023:e11517. [PMID: 37154091 DOI: 10.15252/msb.202211517] [Citation(s) in RCA: 77] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/23/2023] [Accepted: 03/31/2023] [Indexed: 05/10/2023] Open
Abstract
Recent advances in multiplexed single-cell transcriptomics experiments facilitate the high-throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep-learning approaches for single-cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single-cell level for unseen dosages, cell types, time points, and species. Using newly generated single-cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture's modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single-cell Perturb-seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single-cell level and thus accelerate therapeutic applications using single-cell technologies.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | | | - Carlo De Donno
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Leon Hetzel
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Yuge Ji
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Ignacio L Ibarra
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
| | - Sanjay R Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | | | - Pierre Boyeau
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - F Alexander Wolf
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
| | | | - Stephan Günnemann
- Department of Computer Science, Technical University of Munich, Munich, Germany
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - David Lopez-Paz
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Fabian J Theis
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| |
Collapse
|
11
|
An L, Chen J, Chen P, Zhang C, He T, Chen C, Zhou JH, Yeo BTT. Goal-specific brain MRI harmonization. Neuroimage 2022; 263:119570. [PMID: 35987490 DOI: 10.1016/j.neuroimage.2022.119570] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 08/05/2022] [Accepted: 08/15/2022] [Indexed: 11/19/2022] Open
Abstract
There is significant interest in pooling magnetic resonance image (MRI) data from multiple datasets to enable mega-analysis. Harmonization is typically performed to reduce heterogeneity when pooling MRI data across datasets. Most MRI harmonization algorithms do not explicitly consider downstream application performance during harmonization. However, the choice of downstream application might influence what might be considered as study-specific confounds. Therefore, ignoring downstream applications during harmonization might potentially limit downstream performance. Here we propose a goal-specific harmonization framework that utilizes downstream application performance to regularize the harmonization procedure. Our framework can be integrated with a wide variety of harmonization models based on deep neural networks, such as the recently proposed conditional variational autoencoder (cVAE) harmonization model. Three datasets from three different continents with a total of 2787 participants and 10,085 anatomical T1 scans were used for evaluation. We found that cVAE removed more dataset differences than the widely used ComBat model, but at the expense of removing desirable biological information as measured by downstream prediction of mini mental state examination (MMSE) scores and clinical diagnoses. On the other hand, our goal-specific cVAE (gcVAE) was able to remove as much dataset differences as cVAE, while improving downstream cross-sectional prediction of MMSE scores and clinical diagnoses.
Collapse
Affiliation(s)
- Lijun An
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Jianzhong Chen
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Pansheng Chen
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Chen Zhang
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Tong He
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Christopher Chen
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Juan Helen Zhou
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore
| | - B T Thomas Yeo
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore; NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore; Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA.
| |
Collapse
|
12
|
Wei X, Dong J, Wang F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 2022; 38:3377-3384. [PMID: 35639705 DOI: 10.1093/bioinformatics/btac357] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 04/29/2022] [Accepted: 05/20/2022] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Rapid developments of single-cell RNA sequencing technologies allow study of responses to external perturbations at individual cell level. However, in many cases, it is hard to collect the perturbed cells, such as knowing the response of a cell type to the drug before actual medication to a patient. Prediction in silicon could alleviate the problem and save cost. Although several tools have been developed, their prediction accuracy leaves much room for improvement. RESULTS In this article, we propose scPreGAN (Single-Cell data Prediction base on GAN), a deep generative model for predicting the response of single-cell expression to perturbation. ScPreGAN integrates autoencoder and generative adversarial network, the former is to extract common information of the unperturbed data and the perturbed data, the latter is to predict the perturbed data. Experiments on three real datasets show that scPreGAN outperforms three state-of-the-art methods, which can capture the complicated distribution of cell expression and generate the prediction data with the same expression abundance as the real data. AVAILABILITY AND IMPLEMENTATION The implementation of scPreGAN is available via https://github.com/JaneJiayiDong/scPreGAN. To reproduce the results of this article, please visit https://github.com/JaneJiayiDong/scPreGAN-reproducibility. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiajie Wei
- Shanghai Key Lab of Intelligent Information Processing, Shanghai, China
- School of Computer Science and Technology, Fudan University, Shanghai, China
| | - Jiayi Dong
- Shanghai Key Lab of Intelligent Information Processing, Shanghai, China
- School of Computer Science and Technology, Fudan University, Shanghai, China
| | - Fei Wang
- Shanghai Key Lab of Intelligent Information Processing, Shanghai, China
- School of Computer Science and Technology, Fudan University, Shanghai, China
| |
Collapse
|
13
|
Nan Y, Ser JD, Walsh S, Schönlieb C, Roberts M, Selby I, Howard K, Owen J, Neville J, Guiot J, Ernst B, Pastor A, Alberich-Bayarri A, Menzel MI, Walsh S, Vos W, Flerin N, Charbonnier JP, van Rikxoort E, Chatterjee A, Woodruff H, Lambin P, Cerdá-Alberich L, Martí-Bonmatí L, Herrera F, Yang G. Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2022; 82:99-122. [PMID: 35664012 PMCID: PMC8878813 DOI: 10.1016/j.inffus.2022.01.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 12/22/2021] [Accepted: 01/07/2022] [Indexed: 05/13/2023]
Abstract
Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.
Collapse
Affiliation(s)
- Yang Nan
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Javier Del Ser
- Department of Communications Engineering, University of the Basque Country UPV/EHU, Bilbao 48013, Spain
- TECNALIA, Basque Research and Technology Alliance (BRTA), Derio 48160, Spain
| | - Simon Walsh
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
| | - Carola Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
- Oncology R&D, AstraZeneca, Cambridge, Northern Ireland UK
| | - Ian Selby
- Department of Radiology, University of Cambridge, Cambridge, Northern Ireland UK
| | - Kit Howard
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - John Owen
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Jon Neville
- Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
| | - Julien Guiot
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | - Benoit Ernst
- University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
- University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
| | | | | | - Marion I. Menzel
- Technische Hochschule Ingolstadt, Ingolstadt, Germany
- GE Healthcare GmbH, Munich, Germany
| | - Sean Walsh
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Wim Vos
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | - Nina Flerin
- Radiomics (Oncoradiomics SA), Liège, Belgium
| | | | | | - Avishek Chatterjee
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Henry Woodruff
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Philippe Lambin
- Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
| | - Leonor Cerdá-Alberich
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Luis Martí-Bonmatí
- Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Francisco Herrera
- Department of Computer Sciences and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI) University of Granada, Granada, Spain
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Guang Yang
- National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
- Cardiovascular Research Centre, Royal Brompton Hospital, London, Northern Ireland UK
- School of Biomedical Engineering & Imaging Sciences, King's College London, London, Northern Ireland UK
| |
Collapse
|