1
|
Hrovatin K, Moinfar AA, Zappia L, Lapuerta AT, Lengerich B, Kellis M, Theis FJ. Integrating single-cell RNA-seq datasets with substantial batch effects. bioRxiv 2024:2023.11.03.565463. [PMID: 37961672 PMCID: PMC10635119 DOI: 10.1101/2023.11.03.565463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Integration of single-cell RNA-sequencing (scRNA-seq) datasets has become a standard part of the analysis, with conditional variational autoencoders (cVAE) being among the most popular approaches. Increasingly, researchers are asking to map cells across challenging cases such as cross-organs, species, or organoids and primary tissue, as well as different scRNA-seq protocols, including single-cell and single-nuclei. Current computational methods struggle to harmonize datasets with such substantial differences, driven by technical or biological variation. Here, we propose to address these challenges for the popular cVAE-based approaches by introducing and comparing a series of regularization constraints. The two commonly used strategies for increasing batch correction in cVAEs, that is Kullback-Leibler divergence (KL) regularization strength tuning and adversarial learning, suffer from substantial loss of biological information. Therefore, we adapt, implement, and assess alternative regularization strategies for cVAEs and investigate how they improve batch effect removal or better preserve biological variation, enabling us to propose an optimal cVAE-based integration strategy for complex systems. We show that using a VampPrior instead of the commonly used Gaussian prior not only improves the preservation of biological variation but also unexpectedly batch correction. Moreover, we show that our implementation of cycle-consistency loss leads to significantly better biological preservation than adversarial learning implemented in the previously proposed GLUE model. Additionally, we do not recommend relying only on the KL regularization strength tuning for increasing batch correction, as it removes both biological and batch information without discriminating between the two. Based on our findings, we propose a new model that combines VampPrior and cycle-consistency loss. We show that using it for datasets with substantial batch effects improves downstream interpretation of cell states and biological conditions. To ease the use of the newly proposed model, we make it available in the scvi-tools package as an external model named sysVI. Moreover, in the future, these regularization techniques could be added to other established cVAE-based models to improve the integration of datasets with substantial batch effects.
Collapse
Affiliation(s)
- Karin Hrovatin
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Amir Ali Moinfar
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Luke Zappia
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Alejandro Tejada Lapuerta
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Ben Lengerich
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| |
Collapse
|
2
|
Hrovatin K, Bastidas-Ponce A, Bakhti M, Zappia L, Büttner M, Salinno C, Sterr M, Böttcher A, Migliorini A, Lickert H, Theis FJ. Delineating mouse β-cell identity during lifetime and in diabetes with a single cell atlas. Nat Metab 2023; 5:1615-1637. [PMID: 37697055 PMCID: PMC10513934 DOI: 10.1038/s42255-023-00876-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 07/26/2023] [Indexed: 09/13/2023]
Abstract
Although multiple pancreatic islet single-cell RNA-sequencing (scRNA-seq) datasets have been generated, a consensus on pancreatic cell states in development, homeostasis and diabetes as well as the value of preclinical animal models is missing. Here, we present an scRNA-seq cross-condition mouse islet atlas (MIA), a curated resource for interactive exploration and computational querying. We integrate over 300,000 cells from nine scRNA-seq datasets consisting of 56 samples, varying in age, sex and diabetes models, including an autoimmune type 1 diabetes model (NOD), a glucotoxicity/lipotoxicity type 2 diabetes model (db/db) and a chemical streptozotocin β-cell ablation model. The β-cell landscape of MIA reveals new cell states during disease progression and cross-publication differences between previously suggested marker genes. We show that β-cells in the streptozotocin model transcriptionally correlate with those in human type 2 diabetes and mouse db/db models, but are less similar to human type 1 diabetes and mouse NOD β-cells. We also report pathways that are shared between β-cells in immature, aged and diabetes models. MIA enables a comprehensive analysis of β-cell responses to different stressors, providing a roadmap for the understanding of β-cell plasticity, compensation and demise.
Collapse
Affiliation(s)
- Karin Hrovatin
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Aimée Bastidas-Ponce
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
- Medical Faculty, Technical University of Munich, Munich, Germany
| | - Mostafa Bakhti
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Luke Zappia
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Department of Mathematics, Technical University of Munich, Garching, Germany
| | - Maren Büttner
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Genomics and Immunoregulation, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
- Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), Bonn, Germany
| | - Ciro Salinno
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
- Medical Faculty, Technical University of Munich, Munich, Germany
| | - Michael Sterr
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Anika Böttcher
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Adriana Migliorini
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
- McEwen Stem Cell Institute, University Health Network (UHN), Toronto, Ontario, Canada
| | - Heiko Lickert
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany.
- German Center for Diabetes Research (DZD), Neuherberg, Germany.
- Medical Faculty, Technical University of Munich, Munich, Germany.
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
- Department of Mathematics, Technical University of Munich, Garching, Germany.
| |
Collapse
|
3
|
Lotfollahi M, Rybakov S, Hrovatin K, Hediyeh-Zadeh S, Talavera-López C, Misharin AV, Theis FJ. Biologically informed deep learning to query gene programs in single-cell atlases. Nat Cell Biol 2023; 25:337-350. [PMID: 36732632 PMCID: PMC9928587 DOI: 10.1038/s41556-022-01072-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 12/08/2022] [Indexed: 02/04/2023]
Abstract
The increasing availability of large-scale single-cell atlases has enabled the detailed description of cell states. In parallel, advances in deep learning allow rapid analysis of newly generated query datasets by mapping them into reference atlases. However, existing data transformations learned to map query data are not easily explainable using biologically known concepts such as genes or pathways. Here we propose expiMap, a biologically informed deep-learning architecture that enables single-cell reference mapping. ExpiMap learns to map cells into biologically understandable components representing known 'gene programs'. The activity of each cell for a gene program is learned while simultaneously refining them and learning de novo programs. We show that expiMap compares favourably to existing methods while bringing an additional layer of interpretability to integrative single-cell analysis. Furthermore, we demonstrate its applicability to analyse single-cell perturbation responses in different tissues and species and resolve responses of patients who have coronavirus disease 2019 to different treatments across cell types.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Sergei Rybakov
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Karin Hrovatin
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Soroor Hediyeh-Zadeh
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Bioinformatics Division, WEHI, Melbourne, Victoria, Australia
| | - Carlos Talavera-López
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Infectious Diseases and Tropical Medicine, Ludwig-Maximilian-Universität Klinikum, Munich, Germany
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Wellcome Sanger Institute, Cambridge, UK.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
4
|
Katoh-Kurasawa M, Hrovatin K, Hirose S, Webb A, Ho HI, Zupan B, Shaulsky G. Transcriptional milestones in Dictyostelium development. Genome Res 2021; 31:1498-1511. [PMID: 34183452 PMCID: PMC8327917 DOI: 10.1101/gr.275496.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 06/23/2021] [Indexed: 02/02/2023]
Abstract
Dictyostelium development begins with single-cell starvation and ends with multicellular fruiting bodies. Developmental morphogenesis is accompanied by sweeping transcriptional changes, encompassing nearly half of the 13,000 genes in the genome. We performed time-series RNA-sequencing analyses of the wild type and 20 mutants to explore the relationships between transcription and morphogenesis. These strains show developmental arrest at different stages, accelerated development, or atypical morphologies. Considering eight major morphological transitions, we identified 1371 milestone genes whose expression changes sharply between consecutive transitions. We also identified 1099 genes as members of 21 regulons, which are groups of genes that remain coordinately regulated despite the genetic, temporal, and developmental perturbations. The gene annotations in these groups validate known transitions and reveal new developmental events. For example, DNA replication genes are tightly coregulated with cell division genes, so they are expressed in mid-development although chromosomal DNA is not replicated. Our data set includes 486 transcriptional profiles that can help identify new relationships between transcription and development and improve gene annotations. We show its utility by showing that cycles of aggregation and disaggregation in allorecognition-defective mutants involve dedifferentiation. We also show sensitivity to genetic and developmental conditions in two commonly used actin genes, act6 and act15, and robustness of the coaA gene. Finally, we propose that gpdA is a better mRNA quantitation standard because it is less sensitive to external conditions than commonly used standards. The data set is available for democratized exploration through the web application dictyExpress and the data mining environment Orange.
Collapse
Affiliation(s)
- Mariko Katoh-Kurasawa
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Karin Hrovatin
- Faculty of Computer and Information Science, University of Ljubljana, SI-1000 Ljubljana, Slovenia
| | - Shigenori Hirose
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Amanda Webb
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Hsing-I Ho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Blaž Zupan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Faculty of Computer and Information Science, University of Ljubljana, SI-1000 Ljubljana, Slovenia
| | - Gad Shaulsky
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
5
|
Hrovatin K, Kunej T, Dolžan V. Genetic variability of serotonin pathway associated with schizophrenia onset, progression, and treatment. Am J Med Genet B Neuropsychiatr Genet 2020; 183:113-127. [PMID: 31674148 DOI: 10.1002/ajmg.b.32766] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Revised: 09/11/2019] [Accepted: 10/07/2019] [Indexed: 12/22/2022]
Abstract
Schizophrenia (SZ) onset and treatment outcome have important genetic components, however individual genes do not have strong effects on SZ phenotype. Therefore, it is important to use the pathway-based approach and study metabolic and signaling pathways, such as dopaminergic and serotonergic. Serotonin pathway has an important role in brain signaling, nevertheless, its role in SZ is not as thoroughly examined as that of dopamine pathway. In this study, we reviewed serotonin pathway genes and genetic variations associated with SZ, including variations at DNA, RNA, and epigenetic level. We obtained 30 serotonin pathway genes from Kyoto encyclopedia of genes and genomes and used these genes for the literature review. We extracted 20 protein coding serotonin pathway genes with genetic variations associated with SZ onset, development, and treatment from 31 research papers. Genes associated with SZ are present on all levels of serotonin pathway: serotonin synthesis, transport, receptor binding, intracellular signaling, and reuptake; however, regulatory genes are poorly researched. We summarized common challenges of genetic association studies and presented some solutions. The analysis of reported serotonin pathway-SZ associations revealed lack of information about certain serotonin pathway genes potentially associated with SZ. Furthermore, it is becoming clear that interactions among serotonin pathway genes and their regulators may bring further knowledge about their involvement in SZ.
Collapse
Affiliation(s)
- Karin Hrovatin
- University of Ljubljana, Biotechnical Faculty, Department of Animal Science, Ljubljana, Slovenia
| | - Tanja Kunej
- University of Ljubljana, Biotechnical Faculty, Department of Animal Science, Ljubljana, Slovenia
| | - Vita Dolžan
- University of Ljubljana, Faculty of Medicine, Institute of Biochemistry, Pharmacogenetics Laboratory, Ljubljana, Slovenia
| |
Collapse
|
6
|
Abstract
miRNA regulome is whole set of regulatory elements that regulate miRNA expression or are under control of miRNAs. Its understanding is vital for comprehension of miRNA functions. Classification of miRNA-related genetic variability is challenging because miRNA interact with different genomic elements and are studied at different omics levels. In the present study, miRNA-associated genetic variability is presented at three levels: miRNA genes and their upstream regulation, miRNA silencing machinery and miRNA targets. Several types of miRNA-associated genetic variations are known, including short and structural polymorphisms and epimutations. Differential expression can also affect miRNA regulome function. Classification of miRNA-associated genetic variability presents a baseline for complementing sequence variant nomenclature, planning of experiments, protocols for multi-omics data integration and development of biomarkers.
Collapse
Affiliation(s)
- Karin Hrovatin
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domžale, 1230, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domžale, 1230, Slovenia
| |
Collapse
|
7
|
Hrovatin K, Kunej T. Genetic sex determination assays in 53 mammalian species: Literature analysis and guidelines for reporting standardization. Ecol Evol 2018; 8:1009-1018. [PMID: 29375774 PMCID: PMC5773321 DOI: 10.1002/ece3.3707] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 10/27/2017] [Accepted: 11/10/2017] [Indexed: 01/21/2023] Open
Abstract
Erstwhile, sex was determined by observation, which is not always feasible. Nowadays, genetic methods are prevailing due to their accuracy, simplicity, low costs, and time-efficiency. However, there is no comprehensive review enabling overview and development of the field. The studies are heterogeneous, lacking a standardized reporting strategy. Therefore, our aim was to collect genetic sexing assays for mammals and assemble them in a catalogue with unified terminology. Publications were extracted from online databases using key words such as sexing and molecular. The collected data were supplemented with species and gene IDs and the type of sex-specific sequence variant (SSSV). We developed a catalogue and graphic presentation of diagnostic tests for molecular sex determination of mammals, based on 58 papers published from 2/1991 to 10/2016. The catalogue consists of five categories: species, genes, SSSVs, methods, and references. Based on the analysis of published literature, we propose minimal requirements for reporting, consisting of: species scientific name and ID, genetic sequence with name and ID, SSSV, methodology, genomic coordinates (e.g., restriction sites, SSSVs), amplification system, and description of detected amplicon and controls. The present study summarizes vast knowledge that has up to now been scattered across databases, representing the first step toward standardization regarding molecular sexing, enabling a better overview of existing tests and facilitating planned designs of novel tests. The project is ongoing; collecting additional publications, optimizing field development, and standardizing data presentation are needed.
Collapse
Affiliation(s)
- Karin Hrovatin
- Department of Animal ScienceBiotechnical FacultyUniversity of LjubljanaDomzaleSlovenia
| | - Tanja Kunej
- Department of Animal ScienceBiotechnical FacultyUniversity of LjubljanaDomzaleSlovenia
| |
Collapse
|