1
|
Selby DA, Sprang M, Ewald J, Vollmer SJ. Beyond the black box with biologically informed neural networks. Nat Rev Genet 2025; 26:371-372. [PMID: 40038452 DOI: 10.1038/s41576-025-00826-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Affiliation(s)
- David A Selby
- Data Science and its Applications, German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany.
| | - Maximilian Sprang
- Department of Dermatology, University Medical Center of the Johannes Gutenberg University, Mainz, Germany
| | - Jan Ewald
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University, Leipzig, Germany
| | - Sebastian J Vollmer
- Data Science and its Applications, German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
- University of Kaiserslautern-Landau, Kaiserslautern, Germany
| |
Collapse
|
2
|
Jiang Y, Immadi MS, Wang D, Zeng S, On Chan Y, Zhou J, Xu D, Joshi T. IRnet: Immunotherapy response prediction using pathway knowledge-informed graph neural network. J Adv Res 2025; 72:319-331. [PMID: 39097091 DOI: 10.1016/j.jare.2024.07.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 07/10/2024] [Accepted: 07/30/2024] [Indexed: 08/05/2024] Open
Abstract
INTRODUCTION Immune checkpoint inhibitors (ICIs) are potent and precise therapies for various cancer types, significantly improving survival rates in patients who respond positively to them. However, only a minority of patients benefit from ICI treatments. OBJECTIVES Identifying ICI responders before treatment could greatly conserve medical resources, minimize potential drug side effects, and expedite the search for alternative therapies. Our goal is to introduce a novel deep-learning method to predict ICI treatment responses in cancer patients. METHODS The proposed deep-learning framework leverages graph neural network and biological pathway knowledge. We trained and tested our method using ICI-treated patients' data from several clinical trials covering melanoma, gastric cancer, and bladder cancer. RESULTS Our results demonstrate that this predictive model outperforms current state-of-the-art methods and tumor microenvironment-based predictors. Additionally, the model quantifies the importance of pathways, pathway interactions, and genes in its predictions. A web server for IRnet has been developed and deployed, providing broad accessibility to users at https://irnet.missouri.edu. CONCLUSION IRnet is a competitive tool for predicting patient responses to immunotherapy, specifically ICIs. Its interpretability also offers valuable insights into the mechanisms underlying ICI treatments.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA; Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, USA
| | - Manish Sridhar Immadi
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA; Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, USA
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA; Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, USA
| | - Yen On Chan
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA; MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, USA
| | - Jing Zhou
- Department of Surgery, University of Missouri-Columbia, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA; Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, USA; MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, USA
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, USA; Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO, USA; MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO, USA; Department of Biomedical Informatics, Biostatistics and Medical Epidemiology, University of Missouri-Columbia, Columbia, MO, USA.
| |
Collapse
|
3
|
He Y, Li S, Lan H, Long W, Zhai S, Li M, Wen Z. A Transfer Learning Framework for Predicting and Interpreting Drug Responses via Single-Cell RNA-Seq Data. Int J Mol Sci 2025; 26:4365. [PMID: 40362602 PMCID: PMC12072357 DOI: 10.3390/ijms26094365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2025] [Revised: 04/29/2025] [Accepted: 05/02/2025] [Indexed: 05/15/2025] Open
Abstract
Chemotherapy is a fundamental therapy in cancer treatment, yet its effectiveness is often undermined by drug resistance. Understanding the molecular mechanisms underlying drug response remains a major challenge due to tumor heterogeneity, complex cellular interactions, and limited access to clinical samples, which also hinder the performance and interpretability of existing predictive models. Meanwhile, single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for uncovering resistance mechanisms, but the systematic collection and utilization of scRNA-seq drug response data remain limited. In this study, we collected scRNA-seq drug response datasets from publicly available web sources and proposed a transfer learning-based framework to align bulk and single cell sequencing data. A shared encoder was designed to project both bulk and single-cell sequencing data into a unified latent space for drug response prediction, while a sparse decoder guided by prior biological knowledge enhanced interpretability by mapping latent features to predefined pathways. The proposed model achieved superior performance across five curated scRNA-seq datasets and yielded biologically meaningful insights through integrated gradient analysis. This work demonstrates the potential of deep learning to advance drug response prediction and underscores the value of scRNA-seq data in supporting related research.
Collapse
Affiliation(s)
- Yujie He
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Shenghao Li
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Hao Lan
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Wulin Long
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Shengqiu Zhai
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu 610064, China; (Y.H.)
- Medical Big Data Center, Sichuan University, Chengdu 610064, China
| |
Collapse
|
4
|
Wang J, Ye F, Chai H, Jiang Y, Wang T, Ran X, Xia Q, Xu Z, Fu Y, Zhang G, Wu H, Guo G, Guo H, Ruan Y, Wang Y, Xing D, Xu X, Zhang Z. Advances and applications in single-cell and spatial genomics. SCIENCE CHINA. LIFE SCIENCES 2025; 68:1226-1282. [PMID: 39792333 DOI: 10.1007/s11427-024-2770-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 10/10/2024] [Indexed: 01/12/2025]
Abstract
The applications of single-cell and spatial technologies in recent times have revolutionized the present understanding of cellular states and the cellular heterogeneity inherent in complex biological systems. These advancements offer unprecedented resolution in the examination of the functional genomics of individual cells and their spatial context within tissues. In this review, we have comprehensively discussed the historical development and recent progress in the field of single-cell and spatial genomics. We have reviewed the breakthroughs in single-cell multi-omics technologies, spatial genomics methods, and the computational strategies employed toward the analyses of single-cell atlas data. Furthermore, we have highlighted the advances made in constructing cellular atlases and their clinical applications, particularly in the context of disease. Finally, we have discussed the emerging trends, challenges, and opportunities in this rapidly evolving field.
Collapse
Affiliation(s)
- Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Haoxi Chai
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, 310058, China
| | - Yujia Jiang
- BGI Research, Shenzhen, 518083, China
- BGI Research, Hangzhou, 310030, China
| | - Teng Wang
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xia Ran
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China
| | - Qimin Xia
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China
| | - Ziye Xu
- Department of Laboratory Medicine of The First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Yuting Fu
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Guodong Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Hanyu Wu
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Zhejiang Provincial Key Lab for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou, 310058, China.
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China.
| | - Hongshan Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- Institute of Hematology, Zhejiang University, Hangzhou, 310000, China.
| | - Yijun Ruan
- Life Sciences Institute and The Second Affiliated Hospital, Zhejiang University, Hangzhou, 310058, China.
| | - Yongcheng Wang
- Department of Laboratory Medicine of The First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 310058, China.
| | - Dong Xing
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China.
- Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, 100871, China.
| | - Xun Xu
- BGI Research, Shenzhen, 518083, China.
- BGI Research, Hangzhou, 310030, China.
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen, 518083, China.
| | - Zemin Zhang
- Biomedical Pioneering Innovation Center (BIOPIC) and School of Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
5
|
Birk S, Bonafonte-Pardàs I, Feriz AM, Boxall A, Agirre E, Memi F, Maguza A, Yadav A, Armingol E, Fan R, Castelo-Branco G, Theis FJ, Bayraktar OA, Talavera-López C, Lotfollahi M. Quantitative characterization of cell niches in spatially resolved omics data. Nat Genet 2025; 57:897-909. [PMID: 40102688 PMCID: PMC11985353 DOI: 10.1038/s41588-025-02120-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 02/05/2025] [Indexed: 03/20/2025]
Abstract
Spatial omics enable the characterization of colocalized cell communities that coordinate specific functions within tissues. These communities, or niches, are shaped by interactions between neighboring cells, yet existing computational methods rarely leverage such interactions for their identification and characterization. To address this gap, here we introduce NicheCompass, a graph deep-learning method that models cellular communication to learn interpretable cell embeddings that encode signaling events, enabling the identification of niches and their underlying processes. Unlike existing methods, NicheCompass quantitatively characterizes niches based on communication pathways and consistently outperforms alternatives. We show its versatility by mapping tissue architecture during mouse embryonic development and delineating tumor niches in human cancers, including a spatial reference mapping application. Finally, we extend its capabilities to spatial multi-omics, demonstrate cross-technology integration with datasets from different sequencing platforms and construct a whole mouse brain spatial atlas comprising 8.4 million cells, highlighting NicheCompass' scalability. Overall, NicheCompass provides a scalable framework for identifying and analyzing niches through signaling events.
Collapse
Affiliation(s)
- Sebastian Birk
- Institute of AI for Health, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Würzburg Institute of Systems Immunology (WüSI), University of Würzburg, Würzburg, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Irene Bonafonte-Pardàs
- Institute of Computational Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
- Biomedical Center (BMC), Physiological Chemistry, Faculty of Medicine, Ludwig Maximilian University of Munich, Planegg-Martinsried, Germany
| | | | - Adam Boxall
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Eneritz Agirre
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Fani Memi
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Anna Maguza
- Würzburg Institute of Systems Immunology (WüSI), University of Würzburg, Würzburg, Germany
- Faculty of Medicine, University of Würzburg, Würzburg, Germany
| | - Anamika Yadav
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Erick Armingol
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Rong Fan
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Yale Stem Cell Center and Yale Cancer Center, Yale University School of Medicine, New Haven, CT, USA
- Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
- Human and Translational Immunology Program, Yale University School of Medicine, New Haven, CT, USA
| | - Gonçalo Castelo-Branco
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
- Ming Wai Lau Centre for Reparative Medicine, Stockholm Node, Karolinska Institutet, Stockholm, Sweden
| | - Fabian J Theis
- School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Institute of Computational Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | | | - Carlos Talavera-López
- Würzburg Institute of Systems Immunology (WüSI), University of Würzburg, Würzburg, Germany.
- Faculty of Medicine, University of Würzburg, Würzburg, Germany.
| | - Mohammad Lotfollahi
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
- Institute of Computational Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany.
| |
Collapse
|
6
|
Sadria M, Layton A. scVAEDer: integrating deep diffusion models and variational autoencoders for single-cell transcriptomics analysis. Genome Biol 2025; 26:64. [PMID: 40119479 PMCID: PMC11927372 DOI: 10.1186/s13059-025-03519-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 02/27/2025] [Indexed: 03/24/2025] Open
Abstract
Discovering a lower-dimensional embedding of single-cell data can improve downstream analysis. The embedding should encapsulate both the high-level features and low-level variations. While existing generative models attempt to learn such low-dimensional representations, they have limitations. Here, we introduce scVAEDer, a scalable deep-learning model that combines the power of variational autoencoders and deep diffusion models to learn a meaningful representation that retains both global structure and local variations. Using the learned embeddings, scVAEDer can generate novel scRNA-seq data, predict perturbation response on various cell types, identify changes in gene expression during dedifferentiation, and detect master regulators in biological processes.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, ON, Canada.
| | - Anita Layton
- Department of Applied Mathematics, University of Waterloo, Waterloo, ON, Canada
- Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
- School of Pharmacy, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
7
|
Thapa K, Kinali M, Pei S, Luna A, Babur Ö. Strategies to include prior knowledge in omics analysis with deep neural networks. PATTERNS (NEW YORK, N.Y.) 2025; 6:101203. [PMID: 40182174 PMCID: PMC11963003 DOI: 10.1016/j.patter.2025.101203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
High-throughput molecular profiling technologies have revolutionized molecular biology research in the past decades. One important use of molecular data is to make predictions of phenotypes and other features of the organisms using machine learning algorithms. Deep learning models have become increasingly popular for this task due to their ability to learn complex non-linear patterns. Applying deep learning to molecular profiles, however, is challenging due to the very high dimensionality of the data and relatively small sample sizes, causing models to overfit. A solution is to incorporate biological prior knowledge to guide the learning algorithm for processing the functionally related input together. This helps regularize the models and improve their generalizability and interpretability. Here, we describe three major strategies proposed to use prior knowledge in deep learning models to make predictions based on molecular profiles. We review the related deep learning architectures, including the major ideas in relatively new graph neural networks.
Collapse
Affiliation(s)
- Kisan Thapa
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Meric Kinali
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Shichao Pei
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| | - Augustin Luna
- Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, NIH, 9000 Rockville Pike, Bathesda, MD 20892, USA
- Computational Biology Branch, National Library of Medicine, NIH, 9000 Rockville Pike, Bathesda, MD 20892, USA
| | - Özgün Babur
- Computer Science Department, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA 02125, USA
| |
Collapse
|
8
|
Monzó C, Aguerralde-Martin M, Martínez-Mira C, Arzalluz-Luque Á, Conesa A, Tarazona S. MOSim: bulk and single-cell multilayer regulatory network simulator. Brief Bioinform 2025; 26:bbaf110. [PMID: 40116657 PMCID: PMC11926980 DOI: 10.1093/bib/bbaf110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 02/13/2025] [Accepted: 02/21/2025] [Indexed: 03/23/2025] Open
Abstract
As multi-omics sequencing technologies advance, the need for simulation tools capable of generating realistic and diverse (bulk and single-cell) multi-omics datasets for method testing and benchmarking becomes increasingly important. We present MOSim, an R package that simulates both bulk (via mosim function) and single-cell (via sc_mosim function) multi-omics data. The mosim function generates bulk transcriptomics data (RNA-seq) and additional regulatory omics layers (ATAC-seq, miRNA-seq, ChIP-seq, Methyl-seq, and transcription factors), while sc_mosim simulates single-cell transcriptomics data (scRNA-seq) with scATAC-seq and transcription factors as regulatory layers. The tool supports various experimental designs, including simulation of gene co-expression patterns, biological replicates, and differential expression between conditions. MOSim enables users to generate quantification matrices for each simulated omics data type, capturing the heterogeneity and complexity of bulk and single-cell multi-omics datasets. Furthermore, MOSim provides differentially abundant features within each omics layer and elucidates the active regulatory relationships between regulatory omics and gene expression data at both bulk and single-cell levels. By leveraging MOSim, researchers will be able to generate realistic and customizable bulk and single-cell multi-omics datasets to benchmark and validate analytical methods specifically designed for the integrative analysis of diverse regulatory omics data.
Collapse
Affiliation(s)
- Carolina Monzó
- Genomics of Gene Expression Lab, Institute for Integrative Systems Biology, Spanish National Research Council (CSIC-UV), C/ Catedràtic Agustín Escardino Benlloch, Paterna 46980, Spain
| | - Maider Aguerralde-Martin
- Applied Statistics, Operational Research and Quality Department, Universitat Politècnica de València, Camí de Vera s/n, València 46022, Spain
| | - Carlos Martínez-Mira
- Biobam Bioinformatics S.L., Marina de Valencia Base 5, BioHub, C/ de la Travesía, s/n, Sector Puerto 14 E, València 46024, Spain
| | - Ángeles Arzalluz-Luque
- Genomics of Gene Expression Lab, Institute for Integrative Systems Biology, Spanish National Research Council (CSIC-UV), C/ Catedràtic Agustín Escardino Benlloch, Paterna 46980, Spain
- Applied Statistics, Operational Research and Quality Department, Universitat Politècnica de València, Camí de Vera s/n, València 46022, Spain
| | - Ana Conesa
- Genomics of Gene Expression Lab, Institute for Integrative Systems Biology, Spanish National Research Council (CSIC-UV), C/ Catedràtic Agustín Escardino Benlloch, Paterna 46980, Spain
| | - Sonia Tarazona
- Applied Statistics, Operational Research and Quality Department, Universitat Politècnica de València, Camí de Vera s/n, València 46022, Spain
| |
Collapse
|
9
|
Rodov A, Baniadam H, Zeiser R, Amit I, Yosef N, Wertheimer T, Ingelfinger F. Towards the Next Generation of Data-Driven Therapeutics Using Spatially Resolved Single-Cell Technologies and Generative AI. Eur J Immunol 2025; 55:e202451234. [PMID: 39964048 PMCID: PMC11834372 DOI: 10.1002/eji.202451234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 01/28/2025] [Accepted: 02/03/2025] [Indexed: 02/21/2025]
Abstract
Recent advances in multi-omics and spatially resolved single-cell technologies have revolutionised our ability to profile millions of cellular states, offering unprecedented opportunities to understand the complex molecular landscapes of human tissues in both health and disease. These developments hold immense potential for precision medicine, particularly in the rational design of novel therapeutics for treating inflammatory and autoimmune diseases. However, the vast, high-dimensional data generated by these technologies present significant analytical challenges, such as distinguishing technical variation from biological variation or defining relevant questions that leverage the added spatial dimension to improve our understanding of tissue organisation. Generative artificial intelligence (AI), specifically variational autoencoder- or transformer-based latent variable models, provides a powerful and flexible approach to addressing these challenges. These models make inferences about a cell's intrinsic state by effectively identifying complex patterns, reducing data dimensionality and modelling the biological variability in single-cell datasets. This review explores the current landscape of single-cell and spatial multi-omics technologies, the application of generative AI in data analysis and modelling and their transformative impact on our understanding of autoimmune diseases. By combining spatial and single-cell data with advanced AI methodologies, we highlight novel insights into the pathogenesis of autoimmune disorders and outline future directions for leveraging these technologies to achieve the goal of AI-powered personalised medicine.
Collapse
Affiliation(s)
- Avital Rodov
- Department of Systems ImmunologyWeizmann Institute of ScienceRehovotIsrael
| | | | - Robert Zeiser
- Department of Internal Medicine IMedical Center‐University of FreiburgFreiburgGermany
| | - Ido Amit
- Department of Systems ImmunologyWeizmann Institute of ScienceRehovotIsrael
| | - Nir Yosef
- Department of Systems ImmunologyWeizmann Institute of ScienceRehovotIsrael
| | - Tobias Wertheimer
- Department of Internal Medicine IMedical Center‐University of FreiburgFreiburgGermany
| | - Florian Ingelfinger
- Department of Systems ImmunologyWeizmann Institute of ScienceRehovotIsrael
- Department of Internal Medicine IMedical Center‐University of FreiburgFreiburgGermany
| |
Collapse
|
10
|
Davidson NR, Zhang F, Greene CS. BuDDI: Bulk Deconvolution with Domain Invariance to predict cell-type-specific perturbations from bulk. PLoS Comput Biol 2025; 21:e1012742. [PMID: 39823522 PMCID: PMC11790236 DOI: 10.1371/journal.pcbi.1012742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 02/03/2025] [Accepted: 12/20/2024] [Indexed: 01/19/2025] Open
Abstract
While single-cell experiments provide deep cellular resolution within a single sample, some single-cell experiments are inherently more challenging than bulk experiments due to dissociation difficulties, cost, or limited tissue availability. This creates a situation where we have deep cellular profiles of one sample or condition, and bulk profiles across multiple samples and conditions. To bridge this gap, we propose BuDDI (BUlk Deconvolution with Domain Invariance). BuDDI utilizes domain adaptation techniques to effectively integrate available corpora of case-control bulk and reference scRNA-seq observations to infer cell-type-specific perturbation effects. BuDDI achieves this by learning independent latent spaces within a single variational autoencoder (VAE) encompassing at least four sources of variability: 1) cell type proportion, 2) perturbation effect, 3) structured experimental variability, and 4) remaining variability. Since each latent space is encouraged to be independent, we simulate perturbation responses by independently composing each latent space to simulate cell-type-specific perturbation responses. We evaluated BuDDI's performance on simulated and real data with experimental designs of increasing complexity. We first validated that BuDDI could learn domain invariant latent spaces on data with matched samples across each source of variability. Then we validated that BuDDI could accurately predict cell-type-specific perturbation response when no single-cell perturbed profiles were used during training; instead, only bulk samples had both perturbed and non-perturbed observations. Finally, we validated BuDDI on predicting sex-specific differences, an experimental design where it is not possible to have matched samples. In each experiment, BuDDI outperformed all other comparative methods and baselines. As more reference atlases are completed, BuDDI provides a path to combine these resources with bulk-profiled treatment or disease signatures to study perturbations, sex differences, or other factors at single-cell resolution.
Collapse
Affiliation(s)
- Natalie R. Davidson
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America
| | - Fan Zhang
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America
- Department of Medicine Rheumatology, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America
| | - Casey S. Greene
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America
| |
Collapse
|
11
|
Gomez C, Uhrig L, Frouin V, Duchesnay E, Jarraya B, Grigis A. Deep learning models reveal the link between dynamic brain connectivity patterns and states of consciousness. Sci Rep 2024; 14:31606. [PMID: 39738114 PMCID: PMC11686193 DOI: 10.1038/s41598-024-76695-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 10/16/2024] [Indexed: 01/01/2025] Open
Abstract
Decoding states of consciousness from brain activity is a central challenge in neuroscience. Dynamic functional connectivity (dFC) allows the study of short-term temporal changes in functional connectivity (FC) between distributed brain areas. By clustering dFC matrices from resting-state fMRI, we previously described "brain patterns" that underlie different functional configurations of the brain at rest. The networks associated with these patterns have been extensively analyzed. However, the overall dynamic organization and how it relates to consciousness remains unclear. We hypothesized that deep learning networks would help to model this relationship. Recent studies have used low-dimensional variational autoencoders (VAE) to learn meaningful representations that can help explaining consciousness. Here, we investigated the complexity of selecting such a generative model to study brain dynamics, and extended the available methods for latent space characterization and modeling. Therefore, our contributions are threefold. First, compared with probabilistic principal component analysis and sparse VAE, we showed that the selected low-dimensional VAE exhibits balanced performance in reconstructing dFCs and classifying brain patterns. We then explored the organization of the obtained low-dimensional dFC latent representations. We showed how these representations stratify the dynamic organization of the brain patterns as well as the experimental conditions. Finally, we proposed to delve into the proposed brain computational model. We first applied a receptive field analysis to identify preferred directions in the latent space to move from one brain pattern to another. Then, an ablation study was achieved where we virtually inactivated specific brain areas. We demonstrated the model's efficiency in summarizing consciousness-specific information encoded in key inter-areal connections, as described in the global neuronal workspace theory of consciousness. The proposed framework advocates the possibility of developing an interpretable computational brain model of interest for disorders of consciousness, paving the way for a dynamic diagnostic support tool.
Collapse
Affiliation(s)
- Chloé Gomez
- Cognitive Neuroimaging Unit, NeuroSpin center, CEA, INSERM U992, Université Paris-Saclay, Gif-sur-Yvette, France.
| | - Lynn Uhrig
- Cognitive Neuroimaging Unit, NeuroSpin center, CEA, INSERM U992, Université Paris-Saclay, Gif-sur-Yvette, France
- Department of Anesthesiology and Critical Care, Necker Hospital, AP-HP, Université de Paris Cité, Paris, France
| | - Vincent Frouin
- BAOBAB Unit, NeuroSpin center, CEA, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Edouard Duchesnay
- BAOBAB Unit, NeuroSpin center, CEA, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Béchir Jarraya
- Cognitive Neuroimaging Unit, NeuroSpin center, CEA, INSERM U992, Université Paris-Saclay, Gif-sur-Yvette, France.
- Neuroscience Pole, Foch Hospital, Université Paris-Saclay, UVSQ, Suresnes, France.
| | - Antoine Grigis
- BAOBAB Unit, NeuroSpin center, CEA, Université Paris-Saclay, Gif-sur-Yvette, France.
| |
Collapse
|
12
|
Gavriilidis GI, Vasileiou V, Orfanou A, Ishaque N, Psomopoulos F. A mini-review on perturbation modelling across single-cell omic modalities. Comput Struct Biotechnol J 2024; 23:1886-1896. [PMID: 38721585 PMCID: PMC11076269 DOI: 10.1016/j.csbj.2024.04.058] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/23/2024] [Accepted: 04/23/2024] [Indexed: 01/06/2025] Open
Abstract
Recent advances in single-cell omics technology have transformed the landscape of cellular and molecular research, enriching the scope and intricacy of cellular characterisation. Perturbation modelling seeks to comprehensively grasp the effects of external influences like disease onset or molecular knock-outs or external stimulants on cellular physiology, specifically on transcription factors, signal transducers, biological pathways, and dynamic cell states. Machine and deep learning tools transform complex perturbational phenomena in algorithmically tractable tasks to formulate predictions based on various types of single-cell datasets. However, the recent surge in tools and datasets makes it challenging for experimental biologists and computational scientists to keep track of the recent advances in this rapidly expanding filed of single-cell modelling. Here, we recapitulate the main objectives of perturbation modelling and summarise novel single-cell perturbation technologies based on genetic manipulation like CRISPR or compounds, spanning across omic modalities. We then concisely review a burgeoning group of computational methods extending from classical statistical inference methodologies to various machine and deep learning architectures like shallow models or autoencoders, to biologically informed approaches based on gene regulatory networks, and to combinatorial efforts reminiscent of ensemble learning. We also discuss the rising trend of large foundational models in single-cell perturbation modelling inspired by large language models. Lastly, we critically assess the challenges that underline single-cell perturbation modelling while pointing towards relevant future perspectives like perturbation atlases, multi-omics and spatial datasets, causal machine learning for interpretability, multi-task learning for performance and explainability as well as prospects for solving interoperability and benchmarking pitfalls.
Collapse
Affiliation(s)
- George I. Gavriilidis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Vasileios Vasileiou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
- Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, Greece
| | - Aspasia Orfanou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Naveed Ishaque
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Center of Digital Health, Berlin, Germany
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
13
|
Wang FA, Li Y, Zeng T. Deep Learning of radiology-genomics integration for computational oncology: A mini review. Comput Struct Biotechnol J 2024; 23:2708-2716. [PMID: 39035833 PMCID: PMC11260400 DOI: 10.1016/j.csbj.2024.06.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/18/2024] [Accepted: 06/18/2024] [Indexed: 07/23/2024] Open
Abstract
In the field of computational oncology, patient status is often assessed using radiology-genomics, which includes two key technologies and data, such as radiology and genomics. Recent advances in deep learning have facilitated the integration of radiology-genomics data, and even new omics data, significantly improving the robustness and accuracy of clinical predictions. These factors are driving artificial intelligence (AI) closer to practical clinical applications. In particular, deep learning models are crucial in identifying new radiology-genomics biomarkers and therapeutic targets, supported by explainable AI (xAI) methods. This review focuses on recent developments in deep learning for radiology-genomics integration, highlights current challenges, and outlines some research directions for multimodal integration and biomarker discovery of radiology-genomics or radiology-omics that are urgently needed in computational oncology.
Collapse
Affiliation(s)
- Feng-ao Wang
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| | - Yixue Li
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- Guangzhou National Laboratory, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| | - Tao Zeng
- Guangzhou National Laboratory, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
14
|
Hsieh KL, Zhang K, Chu Y, Yu L, Li X, Hu N, Kawosa I, Pilié PG, Bhattacharya PK, Zhi D, Jiang X, Zhao Z, Dai Y. iGTP: Learning interpretable cellular embedding for inferring biological mechanisms underlying single-cell transcriptomics. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.29.24305092. [PMID: 39649598 PMCID: PMC11623718 DOI: 10.1101/2024.03.29.24305092] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Deep-learning models like Variational AutoEncoder have enabled low dimensional cellular embedding representation for large-scale single-cell transcriptomes and shown great flexibility in downstream tasks. However, biologically meaningful latent space is usually missing if no specific structure is designed. Here, we engineered a novel interpretable generative transcriptional program (iGTP) framework that could model the importance of transcriptional program (TP) space and protein-protein interactions (PPI) between different biological states. We demonstrated the performance of iGTP in a diverse biological context using gene ontology, canonical pathway, and different PPI curation. iGTP not only elucidated the ground truth of cellular responses but also surpassed other deep learning models and traditional bioinformatics methods in functional enrichment tasks. By integrating the latent layer with a graph neural network framework, iGTP could effectively infer cellular responses to perturbations. Lastly, we applied iGTP TP embeddings with a latent diffusion model to accurately generate cell embeddings for specific cell types and states. We anticipate that iGTP will offer insights at both PPI and TP levels and holds promise for predicting responses to novel perturbations.
Collapse
|
15
|
de Weerd HA, Guala D, Gustafsson M, Synnergren J, Tegnér J, Lubovac-Pilav Z, Magnusson R. Latent space arithmetic on data embeddings from healthy multi-tissue human RNA-seq decodes disease modules. PATTERNS (NEW YORK, N.Y.) 2024; 5:101093. [PMID: 39568475 PMCID: PMC11573900 DOI: 10.1016/j.patter.2024.101093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/26/2024] [Accepted: 10/11/2024] [Indexed: 11/22/2024]
Abstract
Computational analyses of transcriptomic data have dramatically improved our understanding of complex diseases. However, such approaches are limited by small sample sets of disease-affected material. We asked if a variational autoencoder trained on large groups of healthy human RNA sequencing (RNA-seq) data can capture the fundamental gene regulation system and generalize to unseen disease changes. Importantly, we found this model to successfully compress unseen transcriptomic changes from 25 independent disease datasets. We decoded disease-specific signals from the latent space and found them to contain more disease-specific genes than the corresponding differential expression analysis in 20 of 25 cases. Finally, we matched these disease signals with known drug targets and extracted sets of known and potential pharmaceutical candidates. In summary, our study demonstrates how data-driven representation learning enables the arithmetic deconstruction of the latent space, facilitating the dissection of disease mechanisms and drug targets.
Collapse
Affiliation(s)
- Hendrik A de Weerd
- School of Bioscience, Systems Biology Research Center, University of Skövde, 541 45 Skövde, Sweden
- Department of Physics, Chemistry and Biology, Linköping University, 581 83 Linköping, Sweden
- Department of Biomedical Engineering, Linköping University, 581 83 Linköping, Sweden
| | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
- Merck AB, 169 70 Solna, Sweden
| | - Mika Gustafsson
- Department of Physics, Chemistry and Biology, Linköping University, 581 83 Linköping, Sweden
| | - Jane Synnergren
- School of Bioscience, Systems Biology Research Center, University of Skövde, 541 45 Skövde, Sweden
- Department of Molecular and Clinical Medicine, Institute of Medicine, The Sahlgrenska Academy at University of Gothenburg, 413 45 Gothenburg, Sweden
| | - Jesper Tegnér
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
- Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, L8:05, 171 76, Stockholm, Sweden
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
- Science for Life Laboratory, Tomtebodavägen 23A, 171 65, Solna, Sweden
| | - Zelmina Lubovac-Pilav
- School of Bioscience, Systems Biology Research Center, University of Skövde, 541 45 Skövde, Sweden
| | - Rasmus Magnusson
- School of Bioscience, Systems Biology Research Center, University of Skövde, 541 45 Skövde, Sweden
- Department of Biomedical Engineering, Linköping University, 581 83 Linköping, Sweden
| |
Collapse
|
16
|
Almet AA, Tsai YC, Watanabe M, Nie Q. Inferring pattern-driving intercellular flows from single-cell and spatial transcriptomics. Nat Methods 2024; 21:1806-1817. [PMID: 39187683 PMCID: PMC11466815 DOI: 10.1038/s41592-024-02380-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 07/23/2024] [Indexed: 08/28/2024]
Abstract
From single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics (ST), one can extract high-dimensional gene expression patterns that can be described by intercellular communication networks or decoupled gene modules. These two descriptions of information flow are often assumed to occur independently. However, intercellular communication drives directed flows of information that are mediated by intracellular gene modules, in turn triggering outflows of other signals. Methodologies to describe such intercellular flows are lacking. We present FlowSig, a method that infers communication-driven intercellular flows from scRNA-seq or ST data using graphical causal modeling and conditional independence. We benchmark FlowSig using newly generated experimental cortical organoid data and synthetic data generated from mathematical modeling. We demonstrate FlowSig's utility by applying it to various studies, showing that FlowSig can capture stimulation-induced changes to paracrine signaling in pancreatic islets, demonstrate shifts in intercellular flows due to increasing COVID-19 severity and reconstruct morphogen-driven activator-inhibitor patterns in mouse embryogenesis.
Collapse
Affiliation(s)
- Axel A Almet
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA, USA
| | - Yuan-Chen Tsai
- Department of Anatomy & Neurobiology, University of California, Irvine, Irvine, CA, USA
- Sue & Bill Gross Stem Cell Research Center, University of California, Irvine, Irvine, CA, USA
- School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Momoko Watanabe
- Department of Anatomy & Neurobiology, University of California, Irvine, Irvine, CA, USA
- Sue & Bill Gross Stem Cell Research Center, University of California, Irvine, Irvine, CA, USA
- School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA.
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA, USA.
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
17
|
Joy MT, Carmichael ST. Activity-dependent transcriptional programs in memory regulate motor recovery after stroke. Commun Biol 2024; 7:1048. [PMID: 39183218 PMCID: PMC11345429 DOI: 10.1038/s42003-024-06723-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 08/12/2024] [Indexed: 08/27/2024] Open
Abstract
Stroke causes death of brain tissue leading to long-term deficits. Behavioral evidence from neurorehabilitative therapies suggest learning-induced neuroplasticity can lead to beneficial outcomes. However, molecular and cellular mechanisms that link learning and stroke recovery are unknown. We show that in a mouse model of stroke, which exhibits enhanced recovery of function due to genetic perturbations of learning and memory genes, animals display activity-dependent transcriptional programs that are normally active during formation or storage of new memories. The expression of neuronal activity-dependent genes are predictive of recovery and occupy a molecular latent space unique to motor recovery. With motor recovery, networks of activity-dependent genes are co-expressed with their transcription factor targets forming gene regulatory networks that support activity-dependent transcription, that are normally diminished after stroke. Neuronal activity-dependent changes at the circuit level are influenced by interactions with microglia. At the molecular level, we show that enrichment of activity-dependent programs in neurons lead to transcriptional changes in microglia where they differentially interact to support intercellular signaling pathways for axon guidance, growth and synaptogenesis. Together, these studies identify activity-dependent transcriptional programs as a fundamental mechanism for neural repair post-stroke.
Collapse
Affiliation(s)
- Mary T Joy
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA.
| | - S Thomas Carmichael
- Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| |
Collapse
|
18
|
Maruhashi K, Kashima H, Miyano S, Park H. Meta graphical lasso: uncovering hidden interactions among latent mechanisms. Sci Rep 2024; 14:18105. [PMID: 39103384 PMCID: PMC11300637 DOI: 10.1038/s41598-024-68959-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 07/30/2024] [Indexed: 08/07/2024] Open
Abstract
In complex systems, it's crucial to uncover latent mechanisms and their context-dependent relationships. This is especially true in medical research, where identifying unknown cancer mechanisms and their impact on phenomena like drug resistance is vital. Directly observing these mechanisms is challenging due to measurement complexities, leading to an approach that infers latent mechanisms from observed variable distributions. Despite machine learning advancements enabling sophisticated generative models, their black-box nature complicates the interpretation of complex latent mechanisms. A promising method for understanding these mechanisms involves estimating latent factors through linear projection, though there's no assurance that inferences made under specific conditions will remain valid across contexts. We propose a novel solution, suggesting data, even from systems appearing complex, can often be explained by sparse dependencies among a few common latent factors, regardless of the situation. This simplification allows for modeling that yields significant insights across diverse fields. We demonstrate this with datasets from finance, where we capture societal trends from stock price movements, and medicine, where we uncover new insights into cancer drug resistance through gene expression analysis.
Collapse
Affiliation(s)
- Koji Maruhashi
- Fujitsu Research, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, 2118588, Kanagawa, Japan.
| | | | - Satoru Miyano
- M &D Data Science Center, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, Japan
| | - Heewon Park
- M &D Data Science Center, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, Japan.
- School of Mathematics, Statistics and Data Science, Sungshin Women's University, 2, Bomun-ro 34da-gil, Seongbuk-gu, Seoul, 02844, Republic of Korea.
| |
Collapse
|
19
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
20
|
Wagle MM, Long S, Chen C, Liu C, Yang P. Interpretable deep learning in single-cell omics. Bioinformatics 2024; 40:btae374. [PMID: 38889275 PMCID: PMC11211213 DOI: 10.1093/bioinformatics/btae374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/11/2024] [Accepted: 06/12/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Single-cell omics technologies have enabled the quantification of molecular profiles in individual cells at an unparalleled resolution. Deep learning, a rapidly evolving sub-field of machine learning, has instilled a significant interest in single-cell omics research due to its remarkable success in analysing heterogeneous high-dimensional single-cell omics data. Nevertheless, the inherent multi-layer nonlinear architecture of deep learning models often makes them 'black boxes' as the reasoning behind predictions is often unknown and not transparent to the user. This has stimulated an increasing body of research for addressing the lack of interpretability in deep learning models, especially in single-cell omics data analyses, where the identification and understanding of molecular regulators are crucial for interpreting model predictions and directing downstream experimental validations. RESULTS In this work, we introduce the basics of single-cell omics technologies and the concept of interpretable deep learning. This is followed by a review of the recent interpretable deep learning models applied to various single-cell omics research. Lastly, we highlight the current limitations and discuss potential future directions.
Collapse
Affiliation(s)
- Manoj M Wagle
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Siqu Long
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Carissa Chen
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, NSW 2006, Australia
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Camperdown, NSW 2006, Australia
| |
Collapse
|
21
|
Rivero-Garcia I, Torres M, Sánchez-Cabo F. Deep generative models in single-cell omics. Comput Biol Med 2024; 176:108561. [PMID: 38749321 DOI: 10.1016/j.compbiomed.2024.108561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 04/30/2024] [Accepted: 05/05/2024] [Indexed: 05/31/2024]
Abstract
Deep Generative Models (DGMs) are becoming instrumental for inferring probability distributions inherent to complex processes, such as most questions in biomedical research. For many years, there was a lack of mathematical methods that would allow this inference in the scarce data scenario of biomedical research. The advent of single-cell omics has finally made square the so-called "skinny matrix", allowing to apply mathematical methods already extensively used in other areas. Moreover, it is now possible to integrate data at different molecular levels in thousands or even millions of samples, thanks to the number of single-cell atlases being collaboratively generated. Additionally, DGMs have proven useful in other frequent tasks in single-cell analysis pipelines, from dimensionality reduction, cell type annotation to RNA velocity inference. In spite of its promise, DGMs need to be used with caution in biomedical research, paying special attention to its use to answer the right questions and the definition of appropriate error metrics and validation check points that confirm not only its correct use but also its relevance. All in all, DGMs provide an exciting tool that opens a bright future for the integrative analysis of single-cell -omics to understand health and disease.
Collapse
Affiliation(s)
- Inés Rivero-Garcia
- Universidad Politécnica de Madrid, Madrid, 28040, Spain; Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain
| | - Miguel Torres
- Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain
| | - Fátima Sánchez-Cabo
- Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, 28029, Spain.
| |
Collapse
|
22
|
Yang Y, Seninge L, Wang Z, Oro A, Stuart JM, Ding H. The manatee variational autoencoder model for predicting gene expression alterations caused by transcription factor perturbations. Sci Rep 2024; 14:11794. [PMID: 38782963 PMCID: PMC11116378 DOI: 10.1038/s41598-024-62620-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 05/20/2024] [Indexed: 05/25/2024] Open
Abstract
We present the Manatee variational autoencoder model to predict transcription factor (TF) perturbation-induced transcriptomes. We demonstrate that the Manatee in silico perturbation analysis recapitulates target transcriptomic phenotypes in diverse cellular lineage transitions. We further propose the Manatee in silico screening analysis for prioritizing TF combinations targeting desired transcriptomic phenotypes.
Collapse
Affiliation(s)
- Ying Yang
- Program in Epithelial Biology and Center for Definitive and Curative Medicine, Stanford University, Stanford, CA, USA
| | - Lucas Seninge
- Department of Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ziyuan Wang
- Department of Pharmacy Practice and Science, University of Arizona, Tucson, AZ, USA
| | - Anthony Oro
- Program in Epithelial Biology and Center for Definitive and Curative Medicine, Stanford University, Stanford, CA, USA.
| | - Joshua M Stuart
- Department of Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
| | - Hongxu Ding
- Department of Pharmacy Practice and Science, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
23
|
Chen H, Lu Y, Dai Z, Yang Y, Li Q, Rao Y. Comprehensive single-cell RNA-seq analysis using deep interpretable generative modeling guided by biological hierarchy knowledge. Brief Bioinform 2024; 25:bbae314. [PMID: 38960404 PMCID: PMC11221887 DOI: 10.1093/bib/bbae314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/13/2023] [Accepted: 06/20/2024] [Indexed: 07/05/2024] Open
Abstract
Recent advances in microfluidics and sequencing technologies allow researchers to explore cellular heterogeneity at single-cell resolution. In recent years, deep learning frameworks, such as generative models, have brought great changes to the analysis of transcriptomic data. Nevertheless, relying on the potential space of these generative models alone is insufficient to generate biological explanations. In addition, most of the previous work based on generative models is limited to shallow neural networks with one to three layers of latent variables, which may limit the capabilities of the models. Here, we propose a deep interpretable generative model called d-scIGM for single-cell data analysis. d-scIGM combines sawtooth connectivity techniques and residual networks, thereby constructing a deep generative framework. In addition, d-scIGM incorporates hierarchical prior knowledge of biological domains to enhance the interpretability of the model. We show that d-scIGM achieves excellent performance in a variety of fundamental tasks, including clustering, visualization, and pseudo-temporal inference. Through topic pathway studies, we found that d-scIGM-learned topics are better enriched for biologically meaningful pathways compared to the baseline models. Furthermore, the analysis of drug response data shows that d-scIGM can capture drug response patterns in large-scale experiments, which provides a promising way to elucidate the underlying biological mechanisms. Lastly, in the melanoma dataset, d-scIGM accurately identified different cell types and revealed multiple melanin-related driver genes and key pathways, which are critical for understanding disease mechanisms and drug development.
Collapse
Affiliation(s)
- Hegang Chen
- School of Computer Science and Engineering, Sun Yat-sen University, 132 Waihuan East Road, Guangzhou University Town, 510006, Guangzhou, China
| | - Yuyin Lu
- School of Computer Science and Engineering, Sun Yat-sen University, 132 Waihuan East Road, Guangzhou University Town, 510006, Guangzhou, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, 132 Waihuan East Road, Guangzhou University Town, 510006, Guangzhou, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, 132 Waihuan East Road, Guangzhou University Town, 510006, Guangzhou, China
| | - Qing Li
- Department of Computing, The Hong Kong Polytechnic University, PQ806, Mong Man Wai Building, 999077, Hong Kong SAR
| | - Yanghui Rao
- School of Computer Science and Engineering, Sun Yat-sen University, 132 Waihuan East Road, Guangzhou University Town, 510006, Guangzhou, China
| |
Collapse
|
24
|
Luo X, Niyakan S, Johnstone P, McCorkle S, Park G, López-Marrero V, Yoo S, Dougherty ER, Qian X, Alexander FJ, Jha S, Yoon BJ. Pathway-based analyses of gene expression profiles at low doses of ionizing radiation. FRONTIERS IN BIOINFORMATICS 2024; 4:1280971. [PMID: 38812660 PMCID: PMC11135168 DOI: 10.3389/fbinf.2024.1280971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 04/16/2024] [Indexed: 05/31/2024] Open
Abstract
Radiation exposure poses a significant threat to human health. Emerging research indicates that even low-dose radiation once believed to be safe, may have harmful effects. This perception has spurred a growing interest in investigating the potential risks associated with low-dose radiation exposure across various scenarios. To comprehensively explore the health consequences of low-dose radiation, our study employs a robust statistical framework that examines whether specific groups of genes, belonging to known pathways, exhibit coordinated expression patterns that align with the radiation levels. Notably, our findings reveal the existence of intricate yet consistent signatures that reflect the molecular response to radiation exposure, distinguishing between low-dose and high-dose radiation. Moreover, we leverage a pathway-constrained variational autoencoder to capture the nonlinear interactions within gene expression data. By comparing these two analytical approaches, our study aims to gain valuable insights into the impact of low-dose radiation on gene expression patterns, identify pathways that are differentially affected, and harness the potential of machine learning to uncover hidden activity within biological networks. This comparative analysis contributes to a deeper understanding of the molecular consequences of low-dose radiation exposure.
Collapse
Affiliation(s)
- Xihaier Luo
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
| | - Seyednami Niyakan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, United States
| | - Patrick Johnstone
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
| | - Sean McCorkle
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
| | - Gilchan Park
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
| | - Vanessa López-Marrero
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
| | - Shinjae Yoo
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
| | - Edward R. Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, United States
| | - Xiaoning Qian
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, United States
| | | | - Shantenu Jha
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
- Department of Electrical and Computer Engineering, Rutgers University, New Brunswick, NJ, United States
| | - Byung-Jun Yoon
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, United States
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, United States
| |
Collapse
|
25
|
Pancotti C, Rollo C, Codicè F, Birolo G, Fariselli P, Sanavia T. MUSE-XAE: MUtational Signature Extraction with eXplainable AutoEncoder enhances tumour types classification. Bioinformatics 2024; 40:btae320. [PMID: 38754097 PMCID: PMC11139523 DOI: 10.1093/bioinformatics/btae320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 04/08/2024] [Accepted: 05/15/2024] [Indexed: 05/18/2024] Open
Abstract
MOTIVATION Mutational signatures are a critical component in deciphering the genetic alterations that underlie cancer development and have become a valuable resource to understand the genomic changes during tumorigenesis. Therefore, it is essential to employ precise and accurate methods for their extraction to ensure that the underlying patterns are reliably identified and can be effectively utilized in new strategies for diagnosis, prognosis, and treatment of cancer patients. RESULTS We present MUSE-XAE, a novel method for mutational signature extraction from cancer genomes using an explainable autoencoder. Our approach employs a hybrid architecture consisting of a nonlinear encoder that can capture nonlinear interactions among features, and a linear decoder which ensures the interpretability of the active signatures. We evaluated and compared MUSE-XAE with other available tools on both synthetic and real cancer datasets and demonstrated that it achieves superior performance in terms of precision and sensitivity in recovering mutational signature profiles. MUSE-XAE extracts highly discriminative mutational signature profiles by enhancing the classification of primary tumour types and subtypes in real world settings. This approach could facilitate further research in this area, with neural networks playing a critical role in advancing our understanding of cancer genomics. AVAILABILITY AND IMPLEMENTATION MUSE-XAE software is freely available at https://github.com/compbiomed-unito/MUSE-XAE.
Collapse
Affiliation(s)
- Corrado Pancotti
- Computational Biomedicine Unit, Department of Medical Sciences, University of Torino, via Santena 19, Torino 10126, Italy
| | - Cesare Rollo
- Computational Biomedicine Unit, Department of Medical Sciences, University of Torino, via Santena 19, Torino 10126, Italy
| | - Francesco Codicè
- Computational Biomedicine Unit, Department of Medical Sciences, University of Torino, via Santena 19, Torino 10126, Italy
| | - Giovanni Birolo
- Computational Biomedicine Unit, Department of Medical Sciences, University of Torino, via Santena 19, Torino 10126, Italy
| | - Piero Fariselli
- Computational Biomedicine Unit, Department of Medical Sciences, University of Torino, via Santena 19, Torino 10126, Italy
| | - Tiziana Sanavia
- Computational Biomedicine Unit, Department of Medical Sciences, University of Torino, via Santena 19, Torino 10126, Italy
| |
Collapse
|
26
|
Davidson NR, Zhang F, Greene CS. BuDDI: BulkDeconvolution withDomainInvariance to predict cell-type-specific perturbations from bulk. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.20.549951. [PMID: 37503097 PMCID: PMC10370205 DOI: 10.1101/2023.07.20.549951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
While single-cell experiments provide deep cellular resolution within a single sample, some single-cell experiments are inherently more challenging than bulk experiments due to dissociation difficulties, cost, or limited tissue availability. This creates a situation where we have deep cellular profiles of one sample or condition, and bulk profiles across multiple samples and conditions. To bridge this gap, we propose BuDDI (BUlk Deconvolution with Domain Invariance). BuDDI utilizes domain adaptation techniques to effectively integrate available corpora of case-control bulk and reference scRNA-seq observations to infer cell-type-specific perturbation effects. BuDDI achieves this by learning independent latent spaces within a single variational autoencoder (VAE) encompassing at least four sources of variability: 1) cell type proportion, 2) perturbation effect, 3) structured experimental variability, and 4) remaining variability. Since each latent space is encouraged to be independent, we simulate perturbation responses by independently composing each latent space to simulate cell-type-specific perturbation responses. We evaluated BuDDI's performance on simulated and real data with experimental designs of increasing complexity. We first validated that BuDDI could learn domain invariant latent spaces on data with matched samples across each source of variability. Then we validated that BuDDI could accurately predict cell-type-specific perturbation response when no single-cell perturbed profiles were used during training; instead, only bulk samples had both perturbed and non-perturbed observations. Finally, we validated BuDDI on predicting sex-specific differences, an experimental design where it is not possible to have matched samples. In each experiment, BuDDI outperformed all other comparative methods and baselines. As more reference atlases are completed, BuDDI provides a path to combine these resources with bulk-profiled treatment or disease signatures to study perturbations, sex differences, or other factors at single-cell resolution.
Collapse
Affiliation(s)
- Natalie R Davidson
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552), NHGRI of the National Institutes of Health (K99HG012945), NCI of the National Institutes of Health (R01CA237170, R01CA243188, R01CA200854)
| | - Fan Zhang
- Department of Medicine Rheumatology, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America; Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America · Funded by the Arthritis National Research Foundation Award, the PhRMA foundation, and the University of Colorado Translational Research Scholars Program Award
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado Anschutz School of Medicine, Aurora, Colorado, United States of America · Funded by the Gordon and Betty Moore Foundation (GBMF 4552), NCI of the National Institutes of Health (R01CA237170, R01CA243188, R01CA200854)
| |
Collapse
|
27
|
Wei Z, Chenjun W, Feiyang X, Mingfeng J, Yixuan Z, Qi L, Zhuoxing S, Qi D. scHybridBERT: integrating gene regulation and cell graph for spatiotemporal dynamics in single-cell clustering. Brief Bioinform 2024; 25:bbae018. [PMID: 38517692 PMCID: PMC10959234 DOI: 10.1093/bib/bbae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/19/2023] [Accepted: 01/09/2024] [Indexed: 03/24/2024] Open
Abstract
Graph learning models have received increasing attention in the computational analysis of single-cell RNA sequencing (scRNA-seq) data. Compared with conventional deep neural networks, graph neural networks and language models have exhibited superior performance by extracting graph-structured data from raw gene count matrices. Established deep neural network-based clustering approaches generally focus on temporal expression patterns while ignoring inherent interactions at gene-level as well as cell-level, which could be regarded as spatial dynamics in single-cell data. Both gene-gene and cell-cell interactions are able to boost the performance of cell type detection, under the framework of multi-view modeling. In this study, spatiotemporal embedding and cell graphs are extracted to capture spatial dynamics at the molecular level. In order to enhance the accuracy of cell type detection, this study proposes the scHybridBERT architecture to conduct multi-view modeling of scRNA-seq data using extracted spatiotemporal patterns. In this scHybridBERT method, graph learning models are employed to deal with cell graphs and the Performer model employs spatiotemporal embeddings. Experimental outcomes about benchmark scRNA-seq datasets indicate that the proposed scHybridBERT method is able to enhance the accuracy of single-cell clustering tasks by integrating spatiotemporal embeddings and cell graphs.
Collapse
Affiliation(s)
- Zhang Wei
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Wu Chenjun
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Xing Feiyang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, 200092, Shanghai, China
| | | | - Zhang Yixuan
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| | - Liu Qi
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, 200092, Shanghai, China
| | - Shi Zhuoxing
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, 510060, Guangzhou, China
| | - Dai Qi
- Zhejiang Sci-Tech University, 310028, Hangzhou, China
| |
Collapse
|
28
|
Hu T, Allam M, Cai S, Henderson W, Yueh B, Garipcan A, Ievlev AV, Afkarian M, Beyaz S, Coskun AF. Single-cell spatial metabolomics with cell-type specific protein profiling for tissue systems biology. Nat Commun 2023; 14:8260. [PMID: 38086839 PMCID: PMC10716522 DOI: 10.1038/s41467-023-43917-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
Metabolic reprogramming in cancer and immune cells occurs to support their increasing energy needs in biological tissues. Here we propose Single Cell Spatially resolved Metabolic (scSpaMet) framework for joint protein-metabolite profiling of single immune and cancer cells in male human tissues by incorporating untargeted spatial metabolomics and targeted multiplexed protein imaging in a single pipeline. We utilized the scSpaMet to profile cell types and spatial metabolomic maps of 19507, 31156, and 8215 single cells in human lung cancer, tonsil, and endometrium tissues, respectively. The scSpaMet analysis revealed cell type-dependent metabolite profiles and local metabolite competition of neighboring single cells in human tissues. Deep learning-based joint embedding revealed unique metabolite states within cell types. Trajectory inference showed metabolic patterns along cell differentiation paths. Here we show scSpaMet's ability to quantify and visualize the cell-type specific and spatially resolved metabolic-protein mapping as an emerging tool for systems-level understanding of tissue biology.
Collapse
Affiliation(s)
- Thomas Hu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Mayar Allam
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Shuangyi Cai
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Walter Henderson
- Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Brian Yueh
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Anton V Ievlev
- Oak Ridge National Laboratory, Center for Nanophase Materials Sciences, Oak Ridge, TN, USA
| | - Maryam Afkarian
- Division of Nephrology, Department of Internal Medicine, University of California, Davis, CA, USA
| | - Semir Beyaz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Ahmet F Coskun
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA.
- Winship Cancer Institute, Emory University, Atlanta, GA, USA.
- Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
29
|
Baig Y, Ma HR, Xu H, You L. Autoencoder neural networks enable low dimensional structure analyses of microbial growth dynamics. Nat Commun 2023; 14:7937. [PMID: 38049401 PMCID: PMC10696002 DOI: 10.1038/s41467-023-43455-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 11/09/2023] [Indexed: 12/06/2023] Open
Abstract
The ability to effectively represent microbiome dynamics is a crucial challenge in their quantitative analysis and engineering. By using autoencoder neural networks, we show that microbial growth dynamics can be compressed into low-dimensional representations and reconstructed with high fidelity. These low-dimensional embeddings are just as effective, if not better, than raw data for tasks such as identifying bacterial strains, predicting traits like antibiotic resistance, and predicting community dynamics. Additionally, we demonstrate that essential dynamical information of these systems can be captured using far fewer variables than traditional mechanistic models. Our work suggests that machine learning can enable the creation of concise representations of high-dimensional microbiome dynamics to facilitate data analysis and gain new biological insights.
Collapse
Affiliation(s)
- Yasa Baig
- Department of Physics, Duke University, Durham, NC, USA
- Department of Computer Science, Duke University, Durham, NC, USA
| | - Helena R Ma
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Center for Quantitative Biodesign, Duke University, Durham, NC, USA
| | - Helen Xu
- Department of Computer Science, Duke University, Durham, NC, USA
| | - Lingchong You
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Center for Quantitative Biodesign, Duke University, Durham, NC, USA.
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|
30
|
Toussaint PA, Leiser F, Thiebes S, Schlesner M, Brors B, Sunyaev A. Explainable artificial intelligence for omics data: a systematic mapping study. Brief Bioinform 2023; 25:bbad453. [PMID: 38113073 PMCID: PMC10729786 DOI: 10.1093/bib/bbad453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 07/28/2023] [Accepted: 11/08/2023] [Indexed: 12/21/2023] Open
Abstract
Researchers increasingly turn to explainable artificial intelligence (XAI) to analyze omics data and gain insights into the underlying biological processes. Yet, given the interdisciplinary nature of the field, many findings have only been shared in their respective research community. An overview of XAI for omics data is needed to highlight promising approaches and help detect common issues. Toward this end, we conducted a systematic mapping study. To identify relevant literature, we queried Scopus, PubMed, Web of Science, BioRxiv, MedRxiv and arXiv. Based on keywording, we developed a coding scheme with 10 facets regarding the studies' AI methods, explainability methods and omics data. Our mapping study resulted in 405 included papers published between 2010 and 2023. The inspected papers analyze DNA-based (mostly genomic), transcriptomic, proteomic or metabolomic data by means of neural networks, tree-based methods, statistical methods and further AI methods. The preferred post-hoc explainability methods are feature relevance (n = 166) and visual explanation (n = 52), while papers using interpretable approaches often resort to the use of transparent models (n = 83) or architecture modifications (n = 72). With many research gaps still apparent for XAI for omics data, we deduced eight research directions and discuss their potential for the field. We also provide exemplary research questions for each direction. Many problems with the adoption of XAI for omics data in clinical practice are yet to be resolved. This systematic mapping study outlines extant research on the topic and provides research directions for researchers and practitioners.
Collapse
Affiliation(s)
- Philipp A Toussaint
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
- HIDSS4Health – Helmholtz Information and Data Science School for Health, Karlsruhe, Heidelberg, Germany
| | - Florian Leiser
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Scott Thiebes
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Matthias Schlesner
- Biomedical Informatics, Data Mining and Data Analytics, Faculty of Applied Computer Science and Medical Faculty, University of Augsburg, Augsburg, Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Translational Oncology, National Center for Tumor Diseases, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ali Sunyaev
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|
31
|
Li S, Guo H, Zhang S, Li Y, Li M. Attention-based deep clustering method for scRNA-seq cell type identification. PLoS Comput Biol 2023; 19:e1011641. [PMID: 37948464 PMCID: PMC10703402 DOI: 10.1371/journal.pcbi.1011641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/07/2023] [Accepted: 10/30/2023] [Indexed: 11/12/2023] Open
Abstract
Single-cell sequencing (scRNA-seq) technology provides higher resolution of cellular differences than bulk RNA sequencing and reveals the heterogeneity in biological research. The analysis of scRNA-seq datasets is premised on the subpopulation assignment. When an appropriate reference is not available, such as specific marker genes and single-cell reference atlas, unsupervised clustering approaches become the predominant option. However, the inherent sparsity and high-dimensionality of scRNA-seq datasets pose specific analytical challenges to traditional clustering methods. Therefore, a various deep learning-based methods have been proposed to address these challenges. As each method improves partially, a comprehensive method needs to be proposed. In this article, we propose a novel scRNA-seq data clustering method named AttentionAE-sc (Attention fusion AutoEncoder for single-cell). Two different scRNA-seq clustering strategies are combined through an attention mechanism, that include zero-inflated negative binomial (ZINB)-based methods dealing with the impact of dropout events and graph autoencoder (GAE)-based methods relying on information from neighbors to guide the dimension reduction. Based on an iterative fusion between denoising and topological embeddings, AttentionAE-sc can easily acquire clustering-friendly cell representations that similar cells are closer in the hidden embedding. Compared with several state-of-art baseline methods, AttentionAE-sc demonstrated excellent clustering performance on 16 real scRNA-seq datasets without the need to specify the number of groups. Additionally, AttentionAE-sc learned improved cell representations and exhibited enhanced stability and robustness. Furthermore, AttentionAE-sc achieved remarkable identification in a breast cancer single-cell atlas dataset and provided valuable insights into the heterogeneity among different cell subtypes.
Collapse
Affiliation(s)
- Shenghao Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Hui Guo
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Simai Zhang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Sichuan, China
| | - Yizhou Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Sichuan, China
- School of Cyber Science and Engineering, Sichuan University, Chengdu, Sichuan, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
32
|
Yang Y, McCullough CG, Seninge L, Guo L, Kwon WJ, Zhang Y, Li NY, Gaddam S, Pan C, Zhen H, Torkelson J, Glass IA, Charville G, Que J, Stuart J, Ding H, Oro A. A Spatiotemporal and Machine-Learning Platform Accelerates the Manufacturing of hPSC-derived Esophageal Mucosa. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.24.563664. [PMID: 37961271 PMCID: PMC10634774 DOI: 10.1101/2023.10.24.563664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Human pluripotent stem cell-derived tissue engineering offers great promise in designer cell-based personalized therapeutics. To harness such potential, a broader approach requires a deeper understanding of tissue-level interactions. We previously developed a manufacturing system for the ectoderm-derived skin epithelium for cell replacement therapy. However, it remains challenging to manufacture the endoderm-derived esophageal epithelium, despite both possessing similar stratified structure. Here we employ single cell and spatial technologies to generate a spatiotemporal multi-omics cell atlas for human esophageal development. We illuminate the cellular diversity, dynamics and signal communications for the developing esophageal epithelium and stroma. Using the machine-learning based Manatee, we prioritize the combinations of candidate human developmental signals for in vitro derivation of esophageal basal cells. Functional validation of the Manatee predictions leads to a clinically-compatible system for manufacturing human esophageal mucosa. Our approach creates a versatile platform to accelerate human tissue manufacturing for future cell replacement therapies to treat human genetic defects and wounds.
Collapse
|
33
|
Yelmen B, Decelle A, Boulos LL, Szatkownik A, Furtlehner C, Charpiat G, Jay F. Deep convolutional and conditional neural networks for large-scale genomic data generation. PLoS Comput Biol 2023; 19:e1011584. [PMID: 37903158 PMCID: PMC10635570 DOI: 10.1371/journal.pcbi.1011584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 11/09/2023] [Accepted: 10/09/2023] [Indexed: 11/01/2023] Open
Abstract
Applications of generative models for genomic data have gained significant momentum in the past few years, with scopes ranging from data characterization to generation of genomic segments and functional sequences. In our previous study, we demonstrated that generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be used to create novel high-quality artificial genomes (AGs) which can preserve the complex characteristics of real genomes such as population structure, linkage disequilibrium and selection signals. However, a major drawback of these models is scalability, since the large feature space of genome-wide data increases computational complexity vastly. To address this issue, we implemented a novel convolutional Wasserstein GAN (WGAN) model along with a novel conditional RBM (CRBM) framework for generating AGs with high SNP number. These networks implicitly learn the varying landscape of haplotypic structure in order to capture complex correlation patterns along the genome and generate a wide diversity of plausible haplotypes. We performed comparative analyses to assess both the quality of these generated haplotypes and the amount of possible privacy leakage from the training data. As the importance of genetic privacy becomes more prevalent, the need for effective privacy protection measures for genomic data increases. We used generative neural networks to create large artificial genome segments which possess many characteristics of real genomes without substantial privacy leakage from the training dataset. In the near future, with further improvements in haplotype quality and privacy preservation, large-scale artificial genome databases can be assembled to provide easily accessible surrogates of real databases, allowing researchers to conduct studies with diverse genomic data within a safe ethical framework in terms of donor privacy.
Collapse
Affiliation(s)
- Burak Yelmen
- Université Paris-Saclay, CNRS, INRIA, LISN, Paris, France
- University of Tartu, Institute of Genomics, Tartu, Estonia
| | - Aurélien Decelle
- Université Paris-Saclay, CNRS, INRIA, LISN, Paris, France
- Universidad Complutense de Madrid, Departamento de Física Teórica, Madrid, Spain
| | - Leila Lea Boulos
- Université Paris-Saclay, CNRS, INRIA, LISN, Paris, France
- Université d’Évry Val-d’Essonne, Évry-Courcouronnes, France
| | | | | | | | - Flora Jay
- Université Paris-Saclay, CNRS, INRIA, LISN, Paris, France
| |
Collapse
|
34
|
Martínez-Enguita D, Dwivedi SK, Jörnsten R, Gustafsson M. NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures. Brief Bioinform 2023; 24:bbad293. [PMID: 37587790 PMCID: PMC10516364 DOI: 10.1093/bib/bbad293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/25/2023] [Accepted: 07/29/2023] [Indexed: 08/18/2023] Open
Abstract
Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- David Martínez-Enguita
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Sanjiv K Dwivedi
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| | - Rebecka Jörnsten
- Department of Mathematical Sciences, Chalmers University of Technology, Sweden
| | - Mika Gustafsson
- Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Sweden
| |
Collapse
|
35
|
Schuster V, Krogh A. The Deep Generative Decoder: MAP estimation of representations improves modelling of single-cell RNA data. Bioinformatics 2023; 39:btad497. [PMID: 37572301 PMCID: PMC10483129 DOI: 10.1093/bioinformatics/btad497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 07/12/2023] [Accepted: 08/10/2023] [Indexed: 08/14/2023] Open
Abstract
MOTIVATION Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models, such as variational autoencoders, which use a variational approximation of the likelihood for inference. RESULTS We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori estimation. The DGD handles complex parameterized latent distributions naturally unlike variational autoencoders, which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell datasets. Here, the DGD learns low-dimensional, meaningful, and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable variational autoencoder. AVAILABILITY AND IMPLEMENTATION scDGD is available as a python package at https://github.com/Center-for-Health-Data-Science/scDGD. The remaining code is made available here: https://github.com/Center-for-Health-Data-Science/dgd.
Collapse
Affiliation(s)
- Viktoria Schuster
- Center for Health Data Science, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Anders Krogh
- Center for Health Data Science, University of Copenhagen, 2200 Copenhagen, Denmark
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
| |
Collapse
|
36
|
Almet AA, Yuan H, Annusver K, Ramos R, Liu Y, Wiedemann J, Sorkin DH, Landén NX, Sonkoly E, Haniffa M, Nie Q, Lichtenberger BM, Luecken MD, Andersen B, Tsoi LC, Watt FM, Gudjonsson JE, Plikus MV, Kasper M. A Roadmap for a Consensus Human Skin Cell Atlas and Single-Cell Data Standardization. J Invest Dermatol 2023; 143:1667-1677. [PMID: 37612031 PMCID: PMC10610458 DOI: 10.1016/j.jid.2023.03.1679] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 03/24/2023] [Accepted: 03/29/2023] [Indexed: 08/25/2023]
Abstract
Single-cell technologies have become essential to driving discovery in both basic and translational investigative dermatology. Despite the multitude of available datasets, a central reference atlas of normal human skin, which can serve as a reference resource for skin cell types, cell states, and their molecular signatures, is still lacking. For any such atlas to receive broad acceptance, participation by many investigators during atlas construction is an essential prerequisite. As part of the Human Cell Atlas project, we have assembled a Skin Biological Network to build a consensus Human Skin Cell Atlas and outline a roadmap toward that goal. We define the drivers of skin diversity to be considered when selecting sequencing datasets for the atlas and list practical hurdles during skin sampling that can result in data gaps and impede comprehensive representation and technical considerations for tissue processing and computational analysis, the accounting for which should minimize biases in cell type enrichments and exclusions and decrease batch effects. By outlining our goals for Atlas 1.0, we discuss how it will uncover new aspects of skin biology.
Collapse
Affiliation(s)
- Axel A Almet
- Department of Mathematics, University of California, Irvine, Irvine, California, USA; NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, California, USA
| | - Hao Yuan
- Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
| | - Karl Annusver
- Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
| | - Raul Ramos
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, California, USA; Department of Developmental and Cell Biology, School of Biological Sciences, University of California, Irvine, Irvine, California, USA; Sue and Bill Gross Stem Cell Research Center, University of California, Irvine, Irvine, California, USA
| | - Yingzi Liu
- Department of Developmental and Cell Biology, School of Biological Sciences, University of California, Irvine, Irvine, California, USA; Sue and Bill Gross Stem Cell Research Center, University of California, Irvine, Irvine, California, USA
| | - Julie Wiedemann
- Department of Developmental and Cell Biology, School of Biological Sciences, University of California, Irvine, Irvine, California, USA; Mathematical, Computational & Systems Biology, Department of Medicine, University of California, Irvine, Irvine, California, USA
| | - Dara H Sorkin
- Institute for Clinical & Translational Science, University of California, Irvine, Irvine, California, USA; Department of Medicine, School of Medicine, University of California, Irvine, Irvine, California, USA
| | - Ning Xu Landén
- Dermatology and Venereology Division, Department of Medicine, Solna, Karolinska Institute, Stockholm, Sweden; Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden; Ming Wai Lau Centre for Reparative Medicine, Karolinska Institute, Stockholm, Sweden
| | - Enikö Sonkoly
- Dermatology and Venereology Division, Department of Medicine, Solna, Karolinska Institute, Stockholm, Sweden; Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden; Dermatology and Venereology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom; Biosciences Institute, Newcastle University, Newcastle Upon Tyne, United Kingdom; Department of Dermatology and NIHR Newcastle Biomedical Research Centre, Newcastle Hospitals NHS Foundation Trust, Newcastle Upon Tyne, United Kingdom
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, California, USA; NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, California, USA; Department of Developmental and Cell Biology, School of Biological Sciences, University of California, Irvine, Irvine, California, USA
| | - Beate M Lichtenberger
- Skin & Endothelium Research Division (SERD), Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Malte D Luecken
- Institute of Computational Biology, Helmholtz Munich, Neuherberg, Germany; Institute of Lung Health and Immunity, Helmholtz Munich, Member of the German Center for Lung Research (DZL), Munich, Germany
| | - Bogi Andersen
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, California, USA; Sue and Bill Gross Stem Cell Research Center, University of California, Irvine, Irvine, California, USA; Department of Medicine, School of Medicine, University of California, Irvine, Irvine, California, USA; Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, California, USA
| | - Lam C Tsoi
- Department of Dermatology, University of Michigan, Ann Arbor, Michigan, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA; Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Fiona M Watt
- Centre for Gene Therapy & Regenerative Medicine, Faculty of Life Sciences & Medicine, School of Basic & Medical Biosciences, King's College London, London, United Kingdom; Directors' Research Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | | | - Maksim V Plikus
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, California, USA; Department of Developmental and Cell Biology, School of Biological Sciences, University of California, Irvine, Irvine, California, USA; Sue and Bill Gross Stem Cell Research Center, University of California, Irvine, Irvine, California, USA.
| | - Maria Kasper
- Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden.
| |
Collapse
|
37
|
Abstract
Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.
Collapse
Affiliation(s)
- Burak Yelmen
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Flora Jay
- Laboratoire Interdisciplinaire des Sciences du Numérique, CNRS UMR 9015, INRIA, Université Paris-Saclay, Orsay, France;
| |
Collapse
|
38
|
Wysocka M, Wysocki O, Zufferey M, Landers D, Freitas A. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 2023; 24:198. [PMID: 37189058 PMCID: PMC10186658 DOI: 10.1186/s12859-023-05262-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 03/30/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. METHODS This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. RESULTS We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models. CONCLUSIONS The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.
Collapse
Affiliation(s)
- Magdalena Wysocka
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
| | - Oskar Wysocki
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Marie Zufferey
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| | - Dónal Landers
- DeLondra Oncology Ltd, 38 Carlton Avenue, Wilmslow, SK9 4EP UK
| | - André Freitas
- Digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Department of Computer Science, University of Manchester, Oxford Rd, Manchester, M13 9 PL UK
- Idiap Research Institute, National University of Sciences, Rue Marconi 19, CH - 1920 Martigny, Switzerland
| |
Collapse
|
39
|
Paylar B, Längkvist M, Jass J, Olsson PE. Utilization of Computer Classification Methods for Exposure Prediction and Gene Selection in Daphnia magna Toxicogenomics. BIOLOGY 2023; 12:biology12050692. [PMID: 37237504 DOI: 10.3390/biology12050692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/02/2023] [Accepted: 05/06/2023] [Indexed: 05/28/2023]
Abstract
Zinc (Zn) is an essential element that influences many cellular functions. Depending on bioavailability, Zn can cause both deficiency and toxicity. Zn bioavailability is influenced by water hardness. Therefore, water quality analysis for health-risk assessment should consider both Zn concentration and water hardness. However, exposure media selection for traditional toxicology tests are set to defined hardness levels and do not represent the diverse water chemistry compositions observed in nature. Moreover, these tests commonly use whole organism endpoints, such as survival and reproduction, which require high numbers of test animals and are labor intensive. Gene expression stands out as a promising alternative to provide insight into molecular events that can be used for risk assessment. In this work, we apply machine learning techniques to classify the Zn concentrations and water hardness from Daphnia magna gene expression by using quantitative PCR. A method for gene ranking was explored using techniques from game theory, namely, Shapley values. The results show that standard machine learning classifiers can classify both Zn concentration and water hardness simultaneously, and that Shapley values are a versatile and useful alternative for gene ranking that can provide insight about the importance of individual genes.
Collapse
Affiliation(s)
- Berkay Paylar
- The Life Science Center-Biology, School of Science and Technology, Örebro University, SE-701 82 Örebro, Sweden
| | - Martin Längkvist
- Center for Applied Autonomous Sensor Systems, Örebro University, SE-701 82 Örebro, Sweden
| | - Jana Jass
- The Life Science Center-Biology, School of Science and Technology, Örebro University, SE-701 82 Örebro, Sweden
| | - Per-Erik Olsson
- The Life Science Center-Biology, School of Science and Technology, Örebro University, SE-701 82 Örebro, Sweden
| |
Collapse
|
40
|
Janizek JD, Spiro A, Celik S, Blue BW, Russell JC, Lee TI, Kaeberlin M, Lee SI. PAUSE: principled feature attribution for unsupervised gene expression analysis. Genome Biol 2023; 24:81. [PMID: 37076856 PMCID: PMC10114348 DOI: 10.1186/s13059-023-02901-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 03/17/2023] [Indexed: 04/21/2023] Open
Abstract
As interest in using unsupervised deep learning models to analyze gene expression data has grown, an increasing number of methods have been developed to make these models more interpretable. These methods can be separated into two groups: post hoc analyses of black box models through feature attribution methods and approaches to build inherently interpretable models through biologically-constrained architectures. We argue that these approaches are not mutually exclusive, but can in fact be usefully combined. We propose PAUSE ( https://github.com/suinleelab/PAUSE ), an unsupervised pathway attribution method that identifies major sources of transcriptomic variation when combined with biologically-constrained neural network models.
Collapse
Affiliation(s)
- Joseph D Janizek
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
- Medical Scientist Training Program, University of Washington, Seattle, USA
| | - Anna Spiro
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | | | - Ben W Blue
- Department of Pathology, University of Washington, Seattle, USA
| | - John C Russell
- Department of Pathology, University of Washington, Seattle, USA
| | - Ting-I Lee
- Department of Pathology, University of Washington, Seattle, USA
| | - Matt Kaeberlin
- Department of Pathology, University of Washington, Seattle, USA
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Su-In Lee
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
| |
Collapse
|
41
|
Utriainen M, Morris JH. clusterMaker2: a major update to clusterMaker, a multi-algorithm clustering app for Cytoscape. BMC Bioinformatics 2023; 24:134. [PMID: 37020209 PMCID: PMC10074866 DOI: 10.1186/s12859-023-05225-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 03/11/2023] [Indexed: 04/07/2023] Open
Abstract
BACKGROUND Since the initial publication of clusterMaker, the need for tools to analyze large biological datasets has only increased. New datasets are significantly larger than a decade ago, and new experimental techniques such as single-cell transcriptomics continue to drive the need for clustering or classification techniques to focus on portions of datasets of interest. While many libraries and packages exist that implement various algorithms, there remains the need for clustering packages that are easy to use, integrated with visualization of the results, and integrated with other commonly used tools for biological data analysis. clusterMaker2 has added several new algorithms, including two entirely new categories of analyses: node ranking and dimensionality reduction. Furthermore, many of the new algorithms have been implemented using the Cytoscape jobs API, which provides a mechanism for executing remote jobs from within Cytoscape. Together, these advances facilitate meaningful analyses of modern biological datasets despite their ever-increasing size and complexity. RESULTS The use of clusterMaker2 is exemplified by reanalyzing the yeast heat shock expression experiment that was included in our original paper; however, here we explored this dataset in significantly more detail. Combining this dataset with the yeast protein-protein interaction network from STRING, we were able to perform a variety of analyses and visualizations from within clusterMaker2, including Leiden clustering to break the entire network into smaller clusters, hierarchical clustering to look at the overall expression dataset, dimensionality reduction using UMAP to find correlations between our hierarchical visualization and the UMAP plot, fuzzy clustering, and cluster ranking. Using these techniques, we were able to explore the highest-ranking cluster and determine that it represents a strong contender for proteins working together in response to heat shock. We found a series of clusters that, when re-explored as fuzzy clusters, provide a better presentation of mitochondrial processes. CONCLUSIONS clusterMaker2 represents a significant advance over the previously published version, and most importantly, provides an easy-to-use tool to perform clustering and to visualize clusters within the Cytoscape network context. The new algorithms should be welcome to the large population of Cytoscape users, particularly the new dimensionality reduction and fuzzy clustering techniques.
Collapse
Affiliation(s)
| | - John H Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
42
|
Choi Y, Li R, Quon G. siVAE: interpretable deep generative models for single-cell transcriptomes. Genome Biol 2023; 24:29. [PMID: 36803416 PMCID: PMC9940350 DOI: 10.1186/s13059-023-02850-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 01/06/2023] [Indexed: 02/22/2023] Open
Abstract
Neural networks such as variational autoencoders (VAE) perform dimensionality reduction for the visualization and analysis of genomic data, but are limited in their interpretability: it is unknown which data features are represented by each embedding dimension. We present siVAE, a VAE that is interpretable by design, thereby enhancing downstream analysis tasks. Through interpretation, siVAE also identifies gene modules and hubs without explicit gene network inference. We use siVAE to identify gene modules whose connectivity is associated with diverse phenotypes such as iPSC neuronal differentiation efficiency and dementia, showcasing the wide applicability of interpretable generative models for genomic data analysis.
Collapse
Affiliation(s)
- Yongin Choi
- Graduate Group in Biomedical Engineering, University of California, Davis, Davis, CA, USA
- Genome Center, University of California, Davis, Davis, CA, USA
| | - Ruoxin Li
- Genome Center, University of California, Davis, Davis, CA, USA
- Graduate Group in Biostatistics, University of California, Davis, Davis, CA, USA
| | - Gerald Quon
- Graduate Group in Biomedical Engineering, University of California, Davis, Davis, CA, USA.
- Genome Center, University of California, Davis, Davis, CA, USA.
- Department of Molecular and Cellular Biology, University of California, Davis, Davis, CA, USA.
| |
Collapse
|
43
|
Lotfollahi M, Rybakov S, Hrovatin K, Hediyeh-Zadeh S, Talavera-López C, Misharin AV, Theis FJ. Biologically informed deep learning to query gene programs in single-cell atlases. Nat Cell Biol 2023; 25:337-350. [PMID: 36732632 PMCID: PMC9928587 DOI: 10.1038/s41556-022-01072-x] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 12/08/2022] [Indexed: 02/04/2023]
Abstract
The increasing availability of large-scale single-cell atlases has enabled the detailed description of cell states. In parallel, advances in deep learning allow rapid analysis of newly generated query datasets by mapping them into reference atlases. However, existing data transformations learned to map query data are not easily explainable using biologically known concepts such as genes or pathways. Here we propose expiMap, a biologically informed deep-learning architecture that enables single-cell reference mapping. ExpiMap learns to map cells into biologically understandable components representing known 'gene programs'. The activity of each cell for a gene program is learned while simultaneously refining them and learning de novo programs. We show that expiMap compares favourably to existing methods while bringing an additional layer of interpretability to integrative single-cell analysis. Furthermore, we demonstrate its applicability to analyse single-cell perturbation responses in different tissues and species and resolve responses of patients who have coronavirus disease 2019 to different treatments across cell types.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Sergei Rybakov
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Karin Hrovatin
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Soroor Hediyeh-Zadeh
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Bioinformatics Division, WEHI, Melbourne, Victoria, Australia
| | - Carlos Talavera-López
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Infectious Diseases and Tropical Medicine, Ludwig-Maximilian-Universität Klinikum, Munich, Germany
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Wellcome Sanger Institute, Cambridge, UK.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
44
|
Zhang Y, Wang M, Wang Z, Liu Y, Xiong S, Zou Q. MetaSEM: Gene Regulatory Network Inference from Single-Cell RNA Data by Meta-Learning. Int J Mol Sci 2023; 24:2595. [PMID: 36768917 PMCID: PMC9916710 DOI: 10.3390/ijms24032595] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Regulators in gene regulatory networks (GRNs) are crucial for identifying cell states. However, GRN inference based on scRNA-seq data has several problems, including high dimensionality and sparsity, and requires more label data. Therefore, we propose a meta-learning GRN inference framework to identify regulatory factors. Specifically, meta-learning solves the parameter optimization problem caused by high-dimensional sparse data features. In addition, a few-shot solution was used to solve the problem of lack of label data. A structural equation model (SEM) was embedded in the model to identify important regulators. We integrated the parameter optimization strategy into the bi-level optimization to extract the feature consistent with GRN reasoning. This unique design makes our model robust to small-scale data. By studying the GRN inference task, we confirmed that the selected regulators were closely related to gene expression specificity. We further analyzed the GRN inferred to find the important regulators in cell type identification. Extensive experimental results showed that our model effectively captured the regulator in single-cell GRN inference. Finally, the visualization results verified the importance of the selected regulators for cell type recognition.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Maocheng Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Zixuan Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Yuhang Liu
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Shuwen Xiong
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| |
Collapse
|
45
|
Wang L, Nie R, Zhang J, Cai J. scCapsNet-mask: an updated version of scCapsNet with extended applicability in functional analysis related to scRNA-seq data. BMC Bioinformatics 2022; 23:539. [PMID: 36510124 PMCID: PMC9743530 DOI: 10.1186/s12859-022-05098-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 12/03/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND With the rapid accumulation of scRNA-seq data, more and more automatic cell type identification methods have been developed, especially those based on deep learning. Although these methods have reached relatively high prediction accuracy, many issues still exist. One is the interpretability. The second is how to deal with the non-standard test samples that are not encountered in the training process. RESULTS Here we introduce scCapsNet-mask, an updated version of scCapsNet. The scCapsNet-mask provides a reasonable solution to the issues of interpretability and non-standard test samples. Firstly, the scCapsNet-mask utilizes a mask to ease the task of model interpretation in the original scCapsNet. The results show that scCapsNet-mask could constrain the coupling coefficients, and make a one-to-one correspondence between the primary capsules and type capsules. Secondly, the scCapsNet-mask can process non-standard samples more reasonably. In one example, the scCapsNet-mask was trained on the committed cells, and then tested on less differentiated cells as the non-standard samples. It could not only estimate the lineage bias of less differentiated cells, but also distinguish the development stages more accurately than traditional machine learning models. Therefore, the pseudo-temporal order of cells for each lineage could be established. Following these pseudo-temporal order, lineage specific genes exhibit a gradual increase expression pattern and stem cell associated genes exhibit a gradual decrease expression pattern. In another example, the scCapsNet-mask was trained on scRNA-seq data, and then used to assign cell type in spatial transcriptomics that may contain non-standard sample of doublets. The results show that the scCapsNet-mask not only restored the spatial map but also identified several non-standard samples of doublet. CONCLUSIONS The scCapsNet-mask offers a suitable solution to the challenge of interpretability and non-standard test samples. By adding a mask, it has the advantages of automatic processing and easy interpretation compared with the original scCapsNet. In addition, the scCapsNet-mask could more accurately reflect the composition of non-standard test samples than traditional machine learning methods. Therefore, it can extend its applicability in functional analysis, such as fate bias prediction in less differentiated cells and cell type assignment in spatial transcriptomics.
Collapse
Affiliation(s)
- Lifei Wang
- grid.413073.20000 0004 1758 9341Shulan (Hangzhou) Hospital Affiliated to Zhejiang Shuren University Shulan International Medical College, Hangzhou, China ,grid.464209.d0000 0004 0644 6935China National Center for Bioinformation, Beijing, 100101 China ,grid.9227.e0000000119573309Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Rui Nie
- grid.464209.d0000 0004 0644 6935China National Center for Bioinformation, Beijing, 100101 China ,grid.9227.e0000000119573309Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Jiang Zhang
- grid.20513.350000 0004 1789 9964School of Systems Science, Beijing Normal University, Beijing, 100875 China
| | - Jun Cai
- grid.464209.d0000 0004 0644 6935China National Center for Bioinformation, Beijing, 100101 China ,grid.9227.e0000000119573309Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| |
Collapse
|