1
|
Sadria M, Layton A. scVAEDer: integrating deep diffusion models and variational autoencoders for single-cell transcriptomics analysis. Genome Biol 2025; 26:64. [PMID: 40119479 PMCID: PMC11927372 DOI: 10.1186/s13059-025-03519-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 02/27/2025] [Indexed: 03/24/2025] Open
Abstract
Discovering a lower-dimensional embedding of single-cell data can improve downstream analysis. The embedding should encapsulate both the high-level features and low-level variations. While existing generative models attempt to learn such low-dimensional representations, they have limitations. Here, we introduce scVAEDer, a scalable deep-learning model that combines the power of variational autoencoders and deep diffusion models to learn a meaningful representation that retains both global structure and local variations. Using the learned embeddings, scVAEDer can generate novel scRNA-seq data, predict perturbation response on various cell types, identify changes in gene expression during dedifferentiation, and detect master regulators in biological processes.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, ON, Canada.
| | - Anita Layton
- Department of Applied Mathematics, University of Waterloo, Waterloo, ON, Canada
- Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
- School of Pharmacy, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
2
|
Mirizio G, Sampson S, Iwafuchi M. Interplay between pioneer transcription factors and epigenetic modifiers in cell reprogramming. Regen Ther 2025; 28:246-252. [PMID: 39834592 PMCID: PMC11745816 DOI: 10.1016/j.reth.2024.12.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/05/2024] [Accepted: 12/20/2024] [Indexed: 01/22/2025] Open
Abstract
The generation of induced pluripotent stem cells (iPSCs) from differentiated somatic cells by Yamanaka factors, including pioneer transcription factors (TFs), has greatly reshaped our traditional understanding of cell plasticity and demonstrated the remarkable potential of pioneer TFs. In addition to iPSC reprogramming, pioneer TFs are pivotal in direct reprogramming or transdifferentiation where somatic cells are converted into different cell types without passing through a pluripotent state. Pioneer TFs initiate a reprogramming process through chromatin opening, thereby establishing competence for new gene regulatory programs. The action of pioneer TFs is both influenced by and exerts influence on epigenetic regulation. Despite significant advances, many direct reprogramming processes remain inefficient, which limits their reliability for clinical applications. In this review, we discuss the molecular mechanisms underlying pioneer TF-driven reprogramming, with a focus on their interactions with epigenetic modifiers, including Polycomb repressive complexes (PRCs), nucleosome remodeling and deacetylase (NuRD) complexes, and the DNA methylation machinery. A deeper understanding of the dynamic interplay between pioneer TFs and epigenetic modifiers will be essential for advancing reprogramming technologies and unlocking their full clinical potential.
Collapse
Affiliation(s)
- Gerardo Mirizio
- Division of Developmental Biology, Center for Stem Cell & Organoid Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, OH, 45229, USA
| | - Samuel Sampson
- Division of Developmental Biology, Center for Stem Cell & Organoid Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, OH, 45229, USA
| | - Makiko Iwafuchi
- Division of Developmental Biology, Center for Stem Cell & Organoid Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, OH, 45229, USA
| |
Collapse
|
3
|
Rainey RN, Houman SD, Menendez L, Chang R, Tao L, Bugacov H, McMahon AP, Kalluri R, Oghalai JS, Groves AK, Segil N. Inducible, virus-free direct lineage reprogramming enhances scalable generation of human inner ear hair cell-like cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.20.639352. [PMID: 40060658 PMCID: PMC11888184 DOI: 10.1101/2025.02.20.639352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/18/2025]
Abstract
Mammalian inner ear sensory hair cells are highly sensitive to environmental stress and do not regenerate, making hearing loss progressive and permanent. The paucity and extreme inaccessibility of these cells hinder the development of regenerative and otoprotective strategies, Direct lineage reprogramming to generate large quantities of hair cell-like cells in vitro offers a promising approach to overcome these experimental bottlenecks. Previously, we identified four transcription factors-Six1, Atoh1, Pou4f3, and Gfi1 (SAPG)-capable of converting mouse embryonic fibroblasts, adult tail tip fibroblasts, and postnatal mouse supporting cells into induced hair cell-like cells through retroviral or lentiviral transduction (Menendez et al., 2020). Here, we developed a virus-free, inducible system using a stable human induced pluripotent stem (iPS) cell line carrying doxycycline-inducible SAPG. Our inducible system significantly increases reprogramming efficiency compared to retroviral methods, achieving a ~19-fold greater conversion to a hair cell fate in half the time. Immunostaining, Western blot, and single-nucleus RNA-seq analyses confirm the expression of hair cell-specific markers and activation of hair cell gene networks in reprogrammed cells. The reprogrammed hair cells closely resemble developing fetal hair cells, as evidenced by comparison with a human fetal inner ear dataset. Electrophysiological analysis reveals that the induced hair cell-like cells exhibit diverse voltage-dependent ion currents, including robust, quick-activating, slowly inactivating currents characteristic of primary hair cells. This virus-free approach improves scalability, reproducibility, and the modeling of hair cell differentiation, offering significant potential for hair cell regenerative strategies and preclinical drug discovery targeting ototoxicity and otoprotection.
Collapse
Affiliation(s)
- Robert N Rainey
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
| | - Sam D Houman
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
- Present address: Touro University of California, College of Osteopathic Medicine, Vallejo, California, United States
| | - Louise Menendez
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
| | - Ryan Chang
- Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California, United States
| | - Litao Tao
- Biomedical Sciences Department, School of Medicine, Creighton University, Omaha, Nebraska, United States
| | - Helena Bugacov
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
- Present address: Icahn School of Medicine at Mount Sinai, New York, New York, United States
| | - Andrew P McMahon
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
- Present address: Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States
| | - Radha Kalluri
- Zilkha Neurogenetic Institute, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
- USC Caruso Department of Otolaryngology - Head and Neck Surgery, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
| | - John S Oghalai
- USC Caruso Department of Otolaryngology - Head and Neck Surgery, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
| | - Andrew K Groves
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, Missouri, United States
| | - Neil Segil
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
- Zilkha Neurogenetic Institute, Keck School of Medicine of University of Southern California, Los Angeles, California, United States
| |
Collapse
|
4
|
Gavriilidis GI, Vasileiou V, Orfanou A, Ishaque N, Psomopoulos F. A mini-review on perturbation modelling across single-cell omic modalities. Comput Struct Biotechnol J 2024; 23:1886-1896. [PMID: 38721585 PMCID: PMC11076269 DOI: 10.1016/j.csbj.2024.04.058] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/23/2024] [Accepted: 04/23/2024] [Indexed: 01/06/2025] Open
Abstract
Recent advances in single-cell omics technology have transformed the landscape of cellular and molecular research, enriching the scope and intricacy of cellular characterisation. Perturbation modelling seeks to comprehensively grasp the effects of external influences like disease onset or molecular knock-outs or external stimulants on cellular physiology, specifically on transcription factors, signal transducers, biological pathways, and dynamic cell states. Machine and deep learning tools transform complex perturbational phenomena in algorithmically tractable tasks to formulate predictions based on various types of single-cell datasets. However, the recent surge in tools and datasets makes it challenging for experimental biologists and computational scientists to keep track of the recent advances in this rapidly expanding filed of single-cell modelling. Here, we recapitulate the main objectives of perturbation modelling and summarise novel single-cell perturbation technologies based on genetic manipulation like CRISPR or compounds, spanning across omic modalities. We then concisely review a burgeoning group of computational methods extending from classical statistical inference methodologies to various machine and deep learning architectures like shallow models or autoencoders, to biologically informed approaches based on gene regulatory networks, and to combinatorial efforts reminiscent of ensemble learning. We also discuss the rising trend of large foundational models in single-cell perturbation modelling inspired by large language models. Lastly, we critically assess the challenges that underline single-cell perturbation modelling while pointing towards relevant future perspectives like perturbation atlases, multi-omics and spatial datasets, causal machine learning for interpretability, multi-task learning for performance and explainability as well as prospects for solving interoperability and benchmarking pitfalls.
Collapse
Affiliation(s)
- George I. Gavriilidis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Vasileios Vasileiou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
- Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, Greece
| | - Aspasia Orfanou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Naveed Ishaque
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Center of Digital Health, Berlin, Germany
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
5
|
Belair-Hickey JJ, Fahmy A, Zhang W, Sajid RS, Coles BLK, Salter MW, van der Kooy D. Neural crest precursors from the skin are the primary source of directly reprogrammed neurons. Stem Cell Reports 2024; 19:1620-1634. [PMID: 39486406 PMCID: PMC11589197 DOI: 10.1016/j.stemcr.2024.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 10/04/2024] [Accepted: 10/04/2024] [Indexed: 11/04/2024] Open
Abstract
Direct reprogramming involves the conversion of differentiated cell types without returning to an earlier developmental state. Here, we explore how heterogeneity in developmental lineage and maturity of the starting cell population contributes to direct reprogramming using the conversion of murine fibroblasts into neurons. Our hypothesis is that a single lineage of cells contributes to most reprogramming and that a rare elite precursor with intrinsic bias is the source of reprogrammed neurons. We find that nearly all reprogrammed neurons are derived from the neural crest (NC) lineage. Moreover, when rare proliferating NC precursors are selectively ablated, there is a large reduction in the number of reprogrammed neurons. Previous interpretations of this paradigm are that it demonstrates a cell fate conversion across embryonic germ layers (mesoderm to ectoderm). Our interpretation is that this is actually directed differentiation of a neural lineage stem cell in the skin that has intrinsic bias to produce neuronal progeny.
Collapse
Affiliation(s)
- Justin J Belair-Hickey
- Donnelly Centre, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
| | - Ahmed Fahmy
- Donnelly Centre, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Wenbo Zhang
- Program in Neurosciences and Mental Health, The Hospital for Sick Children, Toronto, ON, Canada
| | - Rifat S Sajid
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | - Brenda L K Coles
- Donnelly Centre, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Michael W Salter
- Program in Neurosciences and Mental Health, The Hospital for Sick Children, Toronto, ON, Canada; Department of Physiology, University of Toronto, Toronto, ON, Canada
| | - Derek van der Kooy
- Donnelly Centre, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
6
|
Cho B, Kim J, Kim S, An S, Hwang Y, Kim Y, Kwon D, Kim J. Epigenetic Dynamics in Reprogramming to Dopaminergic Neurons for Parkinson's Disease. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2403105. [PMID: 39279468 PMCID: PMC11538697 DOI: 10.1002/advs.202403105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 08/28/2024] [Indexed: 09/18/2024]
Abstract
Direct lineage reprogramming into dopaminergic (DA) neurons holds great promise for the more effective production of DA neurons, offering potential therapeutic benefits for conditions such as Parkinson's disease. However, the reprogramming pathway for fully reprogrammed DA neurons remains largely unclear, resulting in immature and dead-end states with low efficiency. In this study, using single-cell RNA sequencing, the trajectory of reprogramming DA neurons at multiple time points, identifying a continuous pathway for their reprogramming is analyzed. It is identified that intermediate cell populations are crucial for resetting host cell fate during early DA neuronal reprogramming. Further, longitudinal dissection uncovered two distinct trajectories: one leading to successful reprogramming and the other to a dead end. Notably, Arid4b, a histone modifier, as a crucial regulator at this branch point, essential for the successful trajectory and acquisition of mature dopaminergic neuronal identity is identified. Consistently, overexpressing Arid4b in the DA neuronal reprogramming process increases the yield of iDA neurons and effectively reverses the disease phenotypes observed in the PD mouse brain. Thus, gaining insights into the cellular trajectory holds significant importance for devising regenerative medicine strategies, particularly in the context of addressing neurodegenerative disorders like Parkinson's disease.
Collapse
Affiliation(s)
- Byounggook Cho
- Laboratory of Stem Cells & Cell ReprogrammingDepartment of Chemistry and Biomedical EngineeringDongguk UniversitySeoul04620Republic of Korea
| | - Junyeop Kim
- Laboratory of Stem Cells & Cell ReprogrammingDepartment of Chemistry and Biomedical EngineeringDongguk UniversitySeoul04620Republic of Korea
| | - Sumin Kim
- Laboratory of Stem Cells & Cell ReprogrammingDepartment of Chemistry and Biomedical EngineeringDongguk UniversitySeoul04620Republic of Korea
| | - Saemin An
- Laboratory of Stem Cells & Cell ReprogrammingDepartment of Chemistry and Biomedical EngineeringDongguk UniversitySeoul04620Republic of Korea
| | - Yerim Hwang
- Laboratory of Stem Cells & Cell ReprogrammingDepartment of Chemistry and Biomedical EngineeringDongguk UniversitySeoul04620Republic of Korea
| | - Yunkyung Kim
- Laboratory of Stem Cells & Cell ReprogrammingDepartment of Chemistry and Biomedical EngineeringDongguk UniversitySeoul04620Republic of Korea
| | - Daeyeol Kwon
- Laboratory of Stem Cells & Cell ReprogrammingDepartment of Chemistry and Biomedical EngineeringDongguk UniversitySeoul04620Republic of Korea
| | - Jongpil Kim
- Laboratory of Stem Cells & Cell ReprogrammingDepartment of Chemistry and Biomedical EngineeringDongguk UniversitySeoul04620Republic of Korea
| |
Collapse
|
7
|
Majima K, Kojima Y, Minoura K, Abe K, Hirose H, Shimamura T. LineageVAE: reconstructing historical cell states and transcriptomes toward unobserved progenitors. Bioinformatics 2024; 40:btae520. [PMID: 39172488 PMCID: PMC11494380 DOI: 10.1093/bioinformatics/btae520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/22/2024] [Accepted: 08/20/2024] [Indexed: 08/23/2024] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) enables comprehensive characterization of the cell state. However, its destructive nature prohibits measuring gene expression changes during dynamic processes such as embryogenesis or cell state divergence due to injury or disease. Although recent studies integrating scRNA-seq with lineage tracing have provided clonal insights between progenitor and mature cells, challenges remain. Because of their experimental nature, observations are sparse, and cells observed in the early state are not the exact progenitors of cells observed at later time points. To overcome these limitations, we developed LineageVAE, a novel computational methodology that utilizes deep learning based on the property that cells sharing barcodes have identical progenitors. RESULTS LineageVAE is a deep generative model that transforms scRNA-seq observations with identical lineage barcodes into sequential trajectories toward a common progenitor in a latent cell state space. This method enables the reconstruction of unobservable cell state transitions, historical transcriptomes, and regulatory dynamics at a single-cell resolution. Applied to hematopoiesis and reprogrammed fibroblast datasets, LineageVAE demonstrated its ability to restore backward cell state transitions and infer progenitor heterogeneity and transcription factor activity along differentiation trajectories. AVAILABILITY AND IMPLEMENTATION The LineageVAE model was implemented in Python using the PyTorch deep learning library. The code is available on GitHub at https://github.com/LzrRacer/LineageVAE/.
Collapse
Affiliation(s)
- Koichiro Majima
- Division of Systems Biology, Nagoya University Graduate School of Medicine, Nagoya, Aichi 466-8550, Japan
| | - Yasuhiro Kojima
- Laboratory of Computational Life Science, National Cancer Center Research Institute, Tokyo, Tokyo 104-0045, Japan
| | - Kodai Minoura
- Japanese Red Cross Aichi Medical Center Nagoya Daiichi Hospital, Nagoya, Aichi 466-8550, Japan
| | - Ko Abe
- Department of Computational and Systems Biology, Tokyo Medical and Dental University Medical Research Institute, Tokyo, Tokyo 113-8510, Japan
| | - Haruka Hirose
- Department of Computational and Systems Biology, Tokyo Medical and Dental University Medical Research Institute, Tokyo, Tokyo 113-8510, Japan
| | - Teppei Shimamura
- Division of Systems Biology, Nagoya University Graduate School of Medicine, Nagoya, Aichi 466-8550, Japan
- Department of Computational and Systems Biology, Tokyo Medical and Dental University Medical Research Institute, Tokyo, Tokyo 113-8510, Japan
| |
Collapse
|
8
|
He R, Feng B, Zhang Y, Li Y, Wang D, Yu L. IGFBP7 promotes endothelial cell repair in the recovery phase of acute lung injury. Clin Sci (Lond) 2024; 138:797-815. [PMID: 38840498 PMCID: PMC11196208 DOI: 10.1042/cs20240179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 05/30/2024] [Accepted: 06/05/2024] [Indexed: 06/07/2024]
Abstract
IGFBP7 has been found to play an important role in inflammatory diseases, such as acute lung injury (ALI). However, the role of IGFBP7 in different stages of inflammation remains unclear. Transcriptome sequencing was used to identify the regulatory genes of IGFBP7, and endothelial IGFBP7 expression was knocked down using Aplnr-Dre mice to evaluate the endothelial proliferation capacity. The expression of proliferation-related genes was detected by Western blotting and RT-PCR assays. In the present study, we found that knockdown of IGFBP7 in endothelial cells significantly decreases the expression of endothelial cell proliferation-related genes and cell number in the recovery phase but not in the acute phase of ALI. Mechanistically, using bulk-RNA sequencing and CO-IP, we found that IGFBP7 promotes phosphorylation of FOS and subsequently up-regulates YAP1 molecules, thereby promoting endothelial cell proliferation. This study indicated that IGFBP7 has diverse roles in different stages of ALI, which extends the understanding of IGFBP7 in different stages of ALI and suggests that IGFBP7 as a potential therapeutic target in ALI needs to take into account the period specificity of ALI.
Collapse
Affiliation(s)
- Rui He
- Department of Respiratory Medicine, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Bo Feng
- Department of Respiratory Medicine, People’s Hospital of Tongnan District, Chongqing, China
| | - Yuezhou Zhang
- Department of Hepatobiliary Surgery, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yuqing Li
- Department of Respiratory Medicine, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Daoxing Wang
- Department of Respiratory Medicine, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Health Commission Key Laboratory for Respiratory Inflammation Damage and Precision Medicine
| | - Linchao Yu
- Department of Respiratory Medicine, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Chongqing Health Commission Key Laboratory for Respiratory Inflammation Damage and Precision Medicine
| |
Collapse
|
9
|
Salignon J, Millan-Ariño L, Garcia MU, Riedel CG. Cactus: A user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis. Genomics 2024; 116:110858. [PMID: 38735595 DOI: 10.1016/j.ygeno.2024.110858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 05/14/2024]
Abstract
The ever decreasing cost of Next-Generation Sequencing coupled with the emergence of efficient and reproducible analysis pipelines has rendered genomic methods more accessible. However, downstream analyses are basic or missing in most workflows, creating a significant barrier for non-bioinformaticians. To help close this gap, we developed Cactus, an end-to-end pipeline for analyzing ATAC-Seq and mRNA-Seq data, either separately or jointly. Its Nextflow-, container-, and virtual environment-based architecture ensures efficient and reproducible analyses. Cactus preprocesses raw reads, conducts differential analyses between conditions, and performs enrichment analyses in various databases, including DNA-binding motifs, ChIP-Seq binding sites, chromatin states, and ontologies. We demonstrate the utility of Cactus in a multi-modal and multi-species case study as well as by showcasing its unique capabilities as compared to other ATAC-Seq pipelines. In conclusion, Cactus can assist researchers in gaining comprehensive insights from chromatin accessibility and gene expression data in a quick, user-friendly, and reproducible manner.
Collapse
Affiliation(s)
- Jérôme Salignon
- Department of Bioscience and Nutrition, Karolinska Institute, Blickagången 16, Huddinge SE-141 83, Sweden.
| | - Lluís Millan-Ariño
- Department of Bioscience and Nutrition, Karolinska Institute, Blickagången 16, Huddinge SE-141 83, Sweden
| | - Maxime U Garcia
- National Genomics Infrastructure, Science for Life Laboratory, Tomtebodavägen 23A, Solna SE-171 65, Sweden; Department of Oncology-Pathology, Karolinska Institute, Visionsgatan 4, Solna SE-171 64, Sweden
| | - Christian G Riedel
- Department of Bioscience and Nutrition, Karolinska Institute, Blickagången 16, Huddinge SE-141 83, Sweden.
| |
Collapse
|
10
|
Zhou X, Pan J, Chen L, Zhang S, Chen Y. DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data. Biomolecules 2024; 14:766. [PMID: 39062480 PMCID: PMC11274664 DOI: 10.3390/biom14070766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/22/2024] [Accepted: 06/25/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding the dynamics of gene regulatory networks (GRNs) across diverse cell types poses a challenge yet holds immense value in unraveling the molecular mechanisms governing cellular processes. Current computational methods, which rely solely on expression changes from bulk RNA-seq and/or scRNA-seq data, often result in high rates of false positives and low precision. Here, we introduce an advanced computational tool, DeepIMAGER, for inferring cell-specific GRNs through deep learning and data integration. DeepIMAGER employs a supervised approach that transforms the co-expression patterns of gene pairs into image-like representations and leverages transcription factor (TF) binding information for model training. It is trained using comprehensive datasets that encompass scRNA-seq profiles and ChIP-seq data, capturing TF-gene pair information across various cell types. Comprehensive validations on six cell lines show DeepIMAGER exhibits superior performance in ten popular GRN inference tools and has remarkable robustness against dropout-zero events. DeepIMAGER was applied to scRNA-seq datasets of multiple myeloma (MM) and detected potential GRNs for TFs of RORC, MITF, and FOXD2 in MM dendritic cells. This technical innovation, combined with its capability to accurately decode GRNs from scRNA-seq, establishes DeepIMAGER as a valuable tool for unraveling complex regulatory networks in various cell types.
Collapse
Affiliation(s)
- Xiguo Zhou
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Jingyi Pan
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Liang Chen
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China; (X.Z.); (J.P.); (L.C.)
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| |
Collapse
|
11
|
Jindal K, Adil MT, Yamaguchi N, Yang X, Wang HC, Kamimoto K, Rivera-Gonzalez GC, Morris SA. Single-cell lineage capture across genomic modalities with CellTag-multi reveals fate-specific gene regulatory changes. Nat Biotechnol 2024; 42:946-959. [PMID: 37749269 PMCID: PMC11180607 DOI: 10.1038/s41587-023-01931-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 07/31/2023] [Indexed: 09/27/2023]
Abstract
Complex gene regulatory mechanisms underlie differentiation and reprogramming. Contemporary single-cell lineage-tracing (scLT) methods use expressed, heritable DNA barcodes to combine cell lineage readout with single-cell transcriptomics. However, reliance on transcriptional profiling limits adaptation to other single-cell assays. With CellTag-multi, we present an approach that enables direct capture of heritable random barcodes expressed as polyadenylated transcripts, in both single-cell RNA sequencing and single-cell Assay for Transposase Accessible Chromatin using sequencing assays, allowing for independent clonal tracking of transcriptional and epigenomic cell states. We validate CellTag-multi to characterize progenitor cell lineage priming during mouse hematopoiesis. Additionally, in direct reprogramming of fibroblasts to endoderm progenitors, we identify core regulatory programs underlying on-target and off-target fates. Furthermore, we reveal the transcription factor Zfp281 as a regulator of reprogramming outcome, biasing cells toward an off-target mesenchymal fate. Our results establish CellTag-multi as a lineage-tracing method compatible with multiple single-cell modalities and demonstrate its utility in revealing fate-specifying gene regulatory changes across diverse paradigms of differentiation and reprogramming.
Collapse
Affiliation(s)
- Kunal Jindal
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Mohd Tayyab Adil
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Naoto Yamaguchi
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Xue Yang
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Helen C Wang
- Department of Pediatrics, Division of Hematology and Oncology, Washington University School of Medicine, St. Louis, MO, USA
| | - Kenji Kamimoto
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Guillermo C Rivera-Gonzalez
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Samantha A Morris
- Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
- Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
12
|
Duan M, Wang Y, Zhao D, Liu H, Zhang G, Li K, Zhang H, Huang L, Zhang R, Zhou F. Orchestrating information across tissues via a novel multitask GAT framework to improve quantitative gene regulation relation modeling for survival analysis. Brief Bioinform 2023; 24:bbad238. [PMID: 37427963 DOI: 10.1093/bib/bbad238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 05/29/2023] [Accepted: 06/08/2023] [Indexed: 07/11/2023] Open
Abstract
Survival analysis is critical to cancer prognosis estimation. High-throughput technologies facilitate the increase in the dimension of genic features, but the number of clinical samples in cohorts is relatively small due to various reasons, including difficulties in participant recruitment and high data-generation costs. Transcriptome is one of the most abundantly available OMIC (referring to the high-throughput data, including genomic, transcriptomic, proteomic and epigenomic) data types. This study introduced a multitask graph attention network (GAT) framework DQSurv for the survival analysis task. We first used a large dataset of healthy tissue samples to pretrain the GAT-based HealthModel for the quantitative measurement of the gene regulatory relations. The multitask survival analysis framework DQSurv used the idea of transfer learning to initiate the GAT model with the pretrained HealthModel and further fine-tuned this model using two tasks i.e. the main task of survival analysis and the auxiliary task of gene expression prediction. This refined GAT was denoted as DiseaseModel. We fused the original transcriptomic features with the difference vector between the latent features encoded by the HealthModel and DiseaseModel for the final task of survival analysis. The proposed DQSurv model stably outperformed the existing models for the survival analysis of 10 benchmark cancer types and an independent dataset. The ablation study also supported the necessity of the main modules. We released the codes and the pretrained HealthModel to facilitate the feature encodings and survival analysis of transcriptome-based future studies, especially on small datasets. The model and the code are available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Meiyu Duan
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Yueying Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Dong Zhao
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Hongmei Liu
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Gongyou Zhang
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Kewei Li
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Haotian Zhang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Lan Huang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Ruochi Zhang
- School of Artificial Intelligence, Jilin University, Changchun, China, 130012
| | - Fengfeng Zhou
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| |
Collapse
|
13
|
Kamimoto K, Stringa B, Hoffmann CM, Jindal K, Solnica-Krezel L, Morris SA. Dissecting cell identity via network inference and in silico gene perturbation. Nature 2023; 614:742-751. [PMID: 36755098 PMCID: PMC9946838 DOI: 10.1038/s41586-022-05688-9] [Citation(s) in RCA: 218] [Impact Index Per Article: 109.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 12/28/2022] [Indexed: 02/10/2023]
Abstract
Cell identity is governed by the complex regulation of gene expression, represented as gene-regulatory networks1. Here we use gene-regulatory networks inferred from single-cell multi-omics data to perform in silico transcription factor perturbations, simulating the consequent changes in cell identity using only unperturbed wild-type data. We apply this machine-learning-based approach, CellOracle, to well-established paradigms-mouse and human haematopoiesis, and zebrafish embryogenesis-and we correctly model reported changes in phenotype that occur as a result of transcription factor perturbation. Through systematic in silico transcription factor perturbation in the developing zebrafish, we simulate and experimentally validate a previously unreported phenotype that results from the loss of noto, an established notochord regulator. Furthermore, we identify an axial mesoderm regulator, lhx1a. Together, these results show that CellOracle can be used to analyse the regulation of cell identity by transcription factors, and can provide mechanistic insights into development and differentiation.
Collapse
Affiliation(s)
- Kenji Kamimoto
- Department of Developmental Biology, Washington University School of Medicine in St Louis, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine in St Louis, St Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Blerta Stringa
- Department of Developmental Biology, Washington University School of Medicine in St Louis, St Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Christy M Hoffmann
- Department of Developmental Biology, Washington University School of Medicine in St Louis, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine in St Louis, St Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Kunal Jindal
- Department of Developmental Biology, Washington University School of Medicine in St Louis, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine in St Louis, St Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Lilianna Solnica-Krezel
- Department of Developmental Biology, Washington University School of Medicine in St Louis, St Louis, MO, USA
- Center of Regenerative Medicine, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Samantha A Morris
- Department of Developmental Biology, Washington University School of Medicine in St Louis, St Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine in St Louis, St Louis, MO, USA.
- Center of Regenerative Medicine, Washington University School of Medicine in St Louis, St Louis, MO, USA.
| |
Collapse
|
14
|
An oracle predicts regulators of cell identity. Nature 2023; 614:630-632. [PMID: 36755144 DOI: 10.1038/d41586-023-00251-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
|