1
|
Lotfollahi M, Yuhan Hao, Theis FJ, Satija R. The future of rapid and automated single-cell data analysis using reference mapping. Cell 2024; 187:2343-2358. [PMID: 38729109 DOI: 10.1016/j.cell.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 03/05/2024] [Accepted: 03/08/2024] [Indexed: 05/12/2024]
Abstract
As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK; Department of Mathematics, Technical University of Munich, Garching, Germany.
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York Genome Center, New York, NY, USA.
| |
Collapse
|
2
|
Zahedi R, Ghamsari R, Argha A, Macphillamy C, Beheshti A, Alizadehsani R, Lovell NH, Lotfollahi M, Alinejad-Rokny H. Deep learning in spatially resolved transcriptfomics: a comprehensive technical view. Brief Bioinform 2024; 25:bbae082. [PMID: 38483255 PMCID: PMC10939360 DOI: 10.1093/bib/bbae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/22/2024] [Accepted: 02/13/2024] [Indexed: 03/17/2024] Open
Abstract
Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives.
Collapse
Affiliation(s)
- Roxana Zahedi
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Reza Ghamsari
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Callum Macphillamy
- School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, 5371, Australia
| | - Amin Beheshti
- School of Computing, Macquarie University, Sydney, 2109, Australia
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, Melbourne, VIC, 3216, Australia
| | - Nigel H Lovell
- The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Hamid Alinejad-Rokny
- UNSW BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, 2052, NSW, Australia
- Tyree Institute of Health Engineering (IHealthE), UNSW Sydney, 2052, NSW, Australia
| |
Collapse
|
3
|
Gayoso A, Weiler P, Lotfollahi M, Klein D, Hong J, Streets A, Theis FJ, Yosef N. Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells. Nat Methods 2024; 21:50-59. [PMID: 37735568 PMCID: PMC10776389 DOI: 10.1038/s41592-023-01994-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 08/08/2023] [Indexed: 09/23/2023]
Abstract
RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot single-cell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a gene-specific dynamical model of RNA metabolism and provides a transcriptome-wide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI's posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use time-dependent transcription rates.
Collapse
Affiliation(s)
- Adam Gayoso
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Philipp Weiler
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Dominik Klein
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Justin Hong
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Aaron Streets
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
4
|
De Donno C, Hediyeh-Zadeh S, Moinfar AA, Wagenstetter M, Zappia L, Lotfollahi M, Theis FJ. Population-level integration of single-cell datasets enables multi-scale analysis across samples. Nat Methods 2023; 20:1683-1692. [PMID: 37813989 PMCID: PMC10630133 DOI: 10.1038/s41592-023-02035-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 09/05/2023] [Indexed: 10/11/2023]
Abstract
The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.
Collapse
Affiliation(s)
- Carlo De Donno
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | | | - Amir Ali Moinfar
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
| | - Marco Wagenstetter
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Luke Zappia
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
| | - Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
5
|
Michielsen L, Lotfollahi M, Strobl D, Sikkema L, Reinders MT, Theis F, Mahfouz A. Single-cell reference mapping to construct and extend cell-type hierarchies. NAR Genom Bioinform 2023; 5:lqad070. [PMID: 37502708 PMCID: PMC10370450 DOI: 10.1093/nargab/lqad070] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 07/10/2023] [Indexed: 07/29/2023] Open
Abstract
Single-cell genomics is now producing an ever-increasing amount of datasets that, when integrated, could provide large-scale reference atlases of tissue in health and disease. Such large-scale atlases increase the scale and generalizability of analyses and enable combining knowledge generated by individual studies. Specifically, individual studies often differ regarding cell annotation terminology and depth, with different groups specializing in different cell type compartments, often using distinct terminology. Understanding how these distinct sets of annotations are related and complement each other would mark a major step towards a consensus-based cell-type annotation reflecting the latest knowledge in the field. Whereas recent computational techniques, referred to as 'reference mapping' methods, facilitate the usage and expansion of existing reference atlases by mapping new datasets (i.e. queries) onto an atlas; a systematic approach towards harmonizing dataset-specific cell-type terminology and annotation depth is still lacking. Here, we present 'treeArches', a framework to automatically build and extend reference atlases while enriching them with an updatable hierarchy of cell-type annotations across different datasets. We demonstrate various use cases for treeArches, from automatically resolving relations between reference and query cell types to identifying unseen cell types absent in the reference, such as disease-associated cell states. We envision treeArches enabling data-driven construction of consensus atlas-level cell-type hierarchies and facilitating efficient usage of reference atlases.
Collapse
Affiliation(s)
| | | | - Daniel Strobl
- Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany
- Institute of Clinical Chemistry and Pathobiochemistry, TUM School of Medicine, Technical University of Munich, 81675 Munich, Germany
| | - Lisa Sikkema
- Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Germany
| | - Marcel J T Reinders
- Department of Human Genetics, Leiden University Medical Center, 2333ZC Leiden, The Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, 2333ZC Leiden, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, 2628XE Delft, The Netherlands
| | - Fabian J Theis
- To whom correspondence should be addressed. Tel: +49 89 3187 43260;
| | - Ahmed Mahfouz
- Correspondence may also be addressed to Ahmed Mahfouz. Tel: +31 71 52 69513;
| |
Collapse
|
6
|
Sikkema L, Ramírez-Suástegui C, Strobl DC, Gillett TE, Zappia L, Madissoon E, Markov NS, Zaragosi LE, Ji Y, Ansari M, Arguel MJ, Apperloo L, Banchero M, Bécavin C, Berg M, Chichelnitskiy E, Chung MI, Collin A, Gay ACA, Gote-Schniering J, Hooshiar Kashani B, Inecik K, Jain M, Kapellos TS, Kole TM, Leroy S, Mayr CH, Oliver AJ, von Papen M, Peter L, Taylor CJ, Walzthoeni T, Xu C, Bui LT, De Donno C, Dony L, Faiz A, Guo M, Gutierrez AJ, Heumos L, Huang N, Ibarra IL, Jackson ND, Kadur Lakshminarasimha Murthy P, Lotfollahi M, Tabib T, Talavera-López C, Travaglini KJ, Wilbrey-Clark A, Worlock KB, Yoshida M, van den Berge M, Bossé Y, Desai TJ, Eickelberg O, Kaminski N, Krasnow MA, Lafyatis R, Nikolic MZ, Powell JE, Rajagopal J, Rojas M, Rozenblatt-Rosen O, Seibold MA, Sheppard D, Shepherd DP, Sin DD, Timens W, Tsankov AM, Whitsett J, Xu Y, Banovich NE, Barbry P, Duong TE, Falk CS, Meyer KB, Kropski JA, Pe'er D, Schiller HB, Tata PR, Schultze JL, Teichmann SA, Misharin AV, Nawijn MC, Luecken MD, Theis FJ. An integrated cell atlas of the lung in health and disease. Nat Med 2023; 29:1563-1577. [PMID: 37291214 PMCID: PMC10287567 DOI: 10.1038/s41591-023-02327-2] [Citation(s) in RCA: 73] [Impact Index Per Article: 73.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 03/30/2023] [Indexed: 06/10/2023]
Abstract
Single-cell technologies have transformed our understanding of human tissues. Yet, studies typically capture only a limited number of donors and disagree on cell type definitions. Integrating many single-cell datasets can address these limitations of individual studies and capture the variability present in the population. Here we present the integrated Human Lung Cell Atlas (HLCA), combining 49 datasets of the human respiratory system into a single atlas spanning over 2.4 million cells from 486 individuals. The HLCA presents a consensus cell type re-annotation with matching marker genes, including annotations of rare and previously undescribed cell types. Leveraging the number and diversity of individuals in the HLCA, we identify gene modules that are associated with demographic covariates such as age, sex and body mass index, as well as gene modules changing expression along the proximal-to-distal axis of the bronchial tree. Mapping new data to the HLCA enables rapid data annotation and interpretation. Using the HLCA as a reference for the study of disease, we identify shared cell states across multiple lung diseases, including SPP1+ profibrotic monocyte-derived macrophages in COVID-19, pulmonary fibrosis and lung carcinoma. Overall, the HLCA serves as an example for the development and use of large-scale, cross-dataset organ atlases within the Human Cell Atlas.
Collapse
Grants
- R01 HL153375 NHLBI NIH HHS
- R01 HL127349 NHLBI NIH HHS
- U54 HL165443 NHLBI NIH HHS
- P01 HL107202 NHLBI NIH HHS
- U01 HL148856 NHLBI NIH HHS
- R21 HL156124 NHLBI NIH HHS
- U54 AG075931 NIA NIH HHS
- Wellcome Trust
- R01 HL146557 NHLBI NIH HHS
- R01 HL123766 NHLBI NIH HHS
- U01 HL148861 NHLBI NIH HHS
- R01 HL141852 NHLBI NIH HHS
- R01 ES034350 NIEHS NIH HHS
- UL1 TR001863 NCATS NIH HHS
- R01 HL126176 NHLBI NIH HHS
- R21 HL161760 NHLBI NIH HHS
- R01 HL145372 NHLBI NIH HHS
- P01 AG049665 NIA NIH HHS
- K12 HD105271 NICHD NIH HHS
- U19 AI135964 NIAID NIH HHS
- P30 CA008748 NCI NIH HHS
- R01 HL142568 NHLBI NIH HHS
- R01 HL153312 NHLBI NIH HHS
- U54 AG079754 NIA NIH HHS
- R56 HL157632 NHLBI NIH HHS
- R01 HL158139 NHLBI NIH HHS
- R01 HL135156 NHLBI NIH HHS
- R01 HL153045 NHLBI NIH HHS
- U54 HL145608 NHLBI NIH HHS
- P50 AR060780 NIAMS NIH HHS
- R01 HL128439 NHLBI NIH HHS
- R01 HL146519 NHLBI NIH HHS
- R01 HL117004 NHLBI NIH HHS
- R01 HL068702 NHLBI NIH HHS
- U01 HL145567 NHLBI NIH HHS
- P01 HL132821 NHLBI NIH HHS
- MR/R015635/1 Medical Research Council
- R01 MD010443 NIMHD NIH HHS
- Chan Zuckerberg Initiative, LLC Seed Network grant (CZF2019-002438) “Lung Cell Atlas 1.0” NIH 1U54HL145608-01 CZIF2022-007488 from the Chan Zuckerberg Initiative Foundation CZIF2022-007488 from the Chan Zuckerberg Initiative Foundation
- ESPOD fellowship of EMBL-EBI and Sanger Institute
- 3IA Cote d’Azur PhD program
- The Ministry of Economic Affairs and Climate Policy by means of the PPP
- EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
- Joachim Herz Stiftung (Joachim Herz Foundation)
- P50 AR060780-06A1
- University College London, Birkbeck MRC Doctoral Training Programme
- Jikei University School of Medicine (Jikei University)
- 5R01HL14254903, 4UH3CA25513503
- R01HL127349, R01HL141852, U01HL145567 and CZI
- MRC Clinician Scientist Fellowship (MR/W00111X/1)
- Chan Zuckerberg Initiative, LLC Seed Network grant (CZF2019-002438) “Lung Cell Atlas 1.0” 2R01HL068702
- R01 HL135156, R01 MD010443, R01 HL128439, P01 HL132821, P01 HL107202, R01 HL117004, and DOD Grant W81WH-16-2-0018
- HL142568 and HL14507 from the NHLBI
- Chan Zuckerberg Initiative, LLC Seed Network grant (CZF2019-002438) “Lung Cell Atlas 1.0”, 2R01HL068702
- Wellcome (WT211276/Z/18/Z) Sanger core grant WT206194 CZIF2022-007488 from the Chan Zuckerberg Initiative Foundation
- R21HL156124, R56HL157632, and R21HL161760
- CZI, 5U01HL148856
- CZI, 5U01HL148856, R01 HL153045
- U.S. Department of Defense (United States Department of Defense)
- The National Institute of Health R01HL145372
- Fondation pour la Recherche Médicale (Foundation for Medical Research in France)
- Conseil Départemental des Alpes Maritimes
- Inserm Cross-cutting Scientific Program HuDeCA 2018, ANR SAHARRA (ANR-19-CE14–0027), ANR-19-P3IA-0002–3IA, the National Infrastructure France Génomique (ANR-10-INBS-09-03), PPIA 4D-OMICS (21-ESRE-0052), and the Chan Zuckerberg Initiative, LLC Seed Network grant (CZF2019-002438) “Lung Cell Atlas 1.0”.
- Wellcome Trust (Wellcome)
- Sanger core grant WT206194 Chan Zuckerberg Initiative, LLC Seed Network grant (CZF2019-002438) “Lung Cell Atlas 1.0” CZIF2022-007488 from the Chan Zuckerberg Initiative Foundation
- Doris Duke Charitable Foundation (DDCF)
- The National Institute of Health R01HL145372 Department of Defense W81XWH-19-1-0416
- The National Institute of Health R01HL146557 and R01HL153375 and funds from Chan Zuckerberg Initiative - Human Lung Cell Atlas-pilot award
- 1U54HL145608-01
- CZI Deep Visual Proteomics
- 1U54HL145608-01, U01HL148861-03
- 1) the Chan Zuckerberg Initiative, LLC Seed Network grant CZF2019-002438 “Lung Cell Atlas 1.0”; 2) R01 HL153312; 3) U19 AI135964; 4) P01 AG049665
- Netherlands Lung Foundation project nos. 5.1.14.020 and 4.1.18.226, LLC Seed Network grant CZF2019-002438 “Lung Cell Atlas 1.0”
- grant number 2019-002438 from the Chan Zuckerberg Foundation, by the Helmholtz Association’s Initiative and Networking Fund through Helmholtz AI [ZT-I-PF-5-01] and by the Bavarian Ministry of Science and the Arts in the framework of the Bavarian Research Association “ForInter” (Interaction of human brain cells)
- 1 U01 HL14555-01, R01 HL123766-04
- NIH U54 AG075931, 5R01 HL146519
Collapse
Affiliation(s)
- Lisa Sikkema
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Ciro Ramírez-Suástegui
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA
| | - Daniel C Strobl
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Institute of Clinical Chemistry and Pathobiochemistry, TUM School of Medicine, Technical University of Munich, Munich, Germany
| | - Tessa E Gillett
- Experimental Pulmonary and Inflammatory Research, Department of Pathology and Medical Biology, University Medical Centre Groningen, University of Groningen, Groningen, the Netherlands
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Luke Zappia
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Garching, Germany
| | | | - Nikolay S Markov
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Laure-Emmanuelle Zaragosi
- Institut de Pharmacologie Moléculaire et Cellulaire, Université Côte d'Azur and Centre National de la Recherche Scientifique, Valbonne, France
| | - Yuge Ji
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Meshal Ansari
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Institute of Lung Health and Immunity (a member of the German Center for Lung Research) and Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Center Munich, Munich, Germany
| | - Marie-Jeanne Arguel
- Institut de Pharmacologie Moléculaire et Cellulaire, Université Côte d'Azur and Centre National de la Recherche Scientifique, Valbonne, France
| | - Leonie Apperloo
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Martin Banchero
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Christophe Bécavin
- Institut de Pharmacologie Moléculaire et Cellulaire, Université Côte d'Azur and Centre National de la Recherche Scientifique, Valbonne, France
| | - Marijn Berg
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | | | - Mei-I Chung
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Antoine Collin
- Institut de Pharmacologie Moléculaire et Cellulaire, Université Côte d'Azur and Centre National de la Recherche Scientifique, Valbonne, France
- 3IA Côte d'Azur, Nice, France
| | - Aurore C A Gay
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Janine Gote-Schniering
- Institute of Lung Health and Immunity (a member of the German Center for Lung Research) and Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Center Munich, Munich, Germany
| | - Baharak Hooshiar Kashani
- Institute of Lung Health and Immunity (a member of the German Center for Lung Research) and Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Center Munich, Munich, Germany
| | - Kemal Inecik
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Manu Jain
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Theodore S Kapellos
- Institute of Lung Health and Immunity (a member of the German Center for Lung Research) and Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Center Munich, Munich, Germany
- Department of Genomics and Immunoregulation, Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| | - Tessa M Kole
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pulmonary Diseases, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Sylvie Leroy
- Pulmonology Department, Fédération Hospitalo-Universitaire OncoAge, Centre Hospitalier Universitaire de Nice, Université Côte d'Azur, Nice, France
| | - Christoph H Mayr
- Institute of Lung Health and Immunity (a member of the German Center for Lung Research) and Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Center Munich, Munich, Germany
| | | | | | - Lance Peter
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Chase J Taylor
- Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Chuan Xu
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Linh T Bui
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Carlo De Donno
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Leander Dony
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Department of Translational Psychiatry, Max Planck Institute of Psychiatry and International Max Planck Research School for Translational Psychiatry, Munich, Germany
| | - Alen Faiz
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- School of Life Sciences, Respiratory Bioinformatics and Molecular Biology, University of Technology Sydney, Sydney, Australia
| | - Minzhe Guo
- Division of Neonatology and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, US
| | | | - Lukas Heumos
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Institute of Lung Health and Immunity (a member of the German Center for Lung Research) and Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Center Munich, Munich, Germany
| | - Ni Huang
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Ignacio L Ibarra
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Nathan D Jackson
- Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
| | - Preetish Kadur Lakshminarasimha Murthy
- Department of Cell Biology, Duke University School of Medicine, Durham, NC, USA
- Department of Pharmacology and Regenerative Medicine, University of Illinois Chicago, Chicago, IL, USA
| | - Mohammad Lotfollahi
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Tracy Tabib
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Carlos Talavera-López
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Infectious Diseases and Tropical Medicine, Klinikum der Lüdwig-Maximilians-Universität, Munich, Germany
| | - Kyle J Travaglini
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Allen Institute for Brain Science, Seattle, WA, USA
| | | | - Kaylee B Worlock
- Department of Respiratory Medicine, Division of Medicine, University College London, London, UK
| | - Masahiro Yoshida
- Department of Respiratory Medicine, Division of Medicine, University College London, London, UK
| | - Maarten van den Berge
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pulmonary Diseases, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Yohan Bossé
- Institut Universitaire de Cardiologie et de Pneumologie de Québec, Department of Molecular Medicine, Laval University, Quebec City, Quebec, Canada
| | - Tushar J Desai
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Oliver Eickelberg
- Division of Pulmonary, Allergy, and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Naftali Kaminski
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Mark A Krasnow
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Robert Lafyatis
- Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Marko Z Nikolic
- Department of Respiratory Medicine, Division of Medicine, University College London, London, UK
| | - Joseph E Powell
- Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Cellular Genomics Futures Institute, University of New South Wales, Sydney, New South Wales, Australia
| | - Jayaraj Rajagopal
- Center for Regenerative Medicine, Massachusetts General Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Mauricio Rojas
- Department of Internal Medicine, Division of Pulmonary, Critical Care and Sleep Medicine, The Ohio State University, Columbus, OH, USA
| | - Orit Rozenblatt-Rosen
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cellular and Tissue Genomics, Genentech, South San Francisco, CA, USA
| | - Max A Seibold
- Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
- Department of Pediatrics, National Jewish Health, Denver, CO, USA
- Division of Pulmonary Sciences and Critical Care Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Dean Sheppard
- Division of Pulmonary, Critical Care, Allergy and Sleep Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Douglas P Shepherd
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, USA
| | - Don D Sin
- Centre for Heart Lung Innovation, St. Paul's Hospital, University of British Columbia, Vancouver, British Columbia, Canada
| | - Wim Timens
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Alexander M Tsankov
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jeffrey Whitsett
- Division of Neonatology and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Yan Xu
- Division of Neonatology and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | | | - Pascal Barbry
- Institut de Pharmacologie Moléculaire et Cellulaire, Université Côte d'Azur and Centre National de la Recherche Scientifique, Valbonne, France
- 3IA Côte d'Azur, Nice, France
| | - Thu Elizabeth Duong
- Department of Pediatrics, Division of Respiratory Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Christine S Falk
- Institute for Transplant Immunology, Hannover Medical School, Hannover, Germany
| | | | - Jonathan A Kropski
- Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Cell and Developmental Biology, Vanderbilt University, Nashville, TN, USA
| | - Dana Pe'er
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Herbert B Schiller
- Institute of Lung Health and Immunity (a member of the German Center for Lung Research) and Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Center Munich, Munich, Germany
| | | | - Joachim L Schultze
- Department of Genomics and Immunoregulation, Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
- PRECISE Platform for Single Cell Genomics and Epigenomics, Deutsches Zentrum für Neurodegenerative Erkrankungen and University of Bonn, Bonn, Germany
| | - Sara A Teichmann
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Martijn C Nawijn
- Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Malte D Luecken
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Institute of Lung Health and Immunity (a member of the German Center for Lung Research) and Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Center Munich, Munich, Germany.
| | - Fabian J Theis
- Department of Computational Health, Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- TUM School of Life Sciences, Technical University of Munich, Munich, Germany.
- Department of Mathematics, Technical University of Munich, Garching, Germany.
| |
Collapse
|
7
|
Lotfollahi M, Klimovskaia Susmelj A, De Donno C, Hetzel L, Ji Y, Ibarra IL, Srivatsan SR, Naghipourfar M, Daza RM, Martin B, Shendure J, McFaline-Figueroa JL, Boyeau P, Wolf FA, Yakubova N, Günnemann S, Trapnell C, Lopez-Paz D, Theis FJ. Predicting cellular responses to complex perturbations in high-throughput screens. Mol Syst Biol 2023:e11517. [PMID: 37154091 DOI: 10.15252/msb.202211517] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/23/2023] [Accepted: 03/31/2023] [Indexed: 05/10/2023] Open
Abstract
Recent advances in multiplexed single-cell transcriptomics experiments facilitate the high-throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep-learning approaches for single-cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single-cell level for unseen dosages, cell types, time points, and species. Using newly generated single-cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture's modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single-cell Perturb-seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single-cell level and thus accelerate therapeutic applications using single-cell technologies.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | | | - Carlo De Donno
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Leon Hetzel
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Yuge Ji
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Ignacio L Ibarra
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
| | - Sanjay R Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | | | - Pierre Boyeau
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - F Alexander Wolf
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
| | | | - Stephan Günnemann
- Department of Computer Science, Technical University of Munich, Munich, Germany
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - David Lopez-Paz
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Fabian J Theis
- Helmholtz Center Munich - German Research Center for Environmental Health, Institute of Computational Biology, Munich, Germany
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| |
Collapse
|
8
|
Feriz AM, Khosrojerdi A, Lotfollahi M, Shamsaki N, GhasemiGol M, HosseiniGol E, Fereidouni M, Rohban MH, Sebzari AR, Saghafi S, Leone P, Silvestris N, Safarpour H, Racanelli V. Single-cell RNA sequencing uncovers heterogeneous transcriptional signatures in tumor-infiltrated dendritic cells in prostate cancer. Heliyon 2023; 9:e15694. [PMID: 37144199 PMCID: PMC10151421 DOI: 10.1016/j.heliyon.2023.e15694] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/11/2023] [Accepted: 04/19/2023] [Indexed: 05/06/2023] Open
Abstract
Prostate cancer (PCa) is one of the two solid malignancies in which a higher T cell infiltration in the tumor microenvironment (TME) corresponds with a worse prognosis for the tumor. The inability of T cells to eliminate tumor cells despite an increase in their number reinforces the possibility of impaired antigen presentation. In this study, we investigated the TME at single-cell resolution to understand the molecular function and communication of dendritic cells (DCs) (as professional antigen-presenting cells). According to our data, tumor cells stimulate the migration of immature DCs to the tumor site by inducing inflammatory chemokines. Many signaling pathways such as TNF-α/NF-κB, IL2/STAT5, and E2F up-regulated after DCs enter the tumor location. In addition, some molecules such as GPR34 and SLCO2B1 decreased on the surface of DCs. The analysis of molecular and signaling alterations in DCs revealed some suppression mechanisms of tumors, such as removing mature DCs, reducing the DC's survival, inducing anergy or exhaustion in the effector T cells, and enhancing the differentiation of T cells to Th2 and Tregs. In addition, we investigated the cellular and molecular communication between DCs and macrophages in the tumor site and found three molecular pairs including CCR5/CCL5, CD52/SIGLEC10, and HLA-DPB1/TNFSF13B. These molecular pairs are involved in the migration of immature DCs to the TME and disrupt the antigen-presenting function of DCs. Furthermore, we presented new therapeutic targets by the construction of a gene co-expression network. These data increase our knowledge of the heterogeneity and the role of DCs in PCa TME.
Collapse
Affiliation(s)
- Adib Miraki Feriz
- Birjand University of Medical Sciences (BUMS), Birjand, Iran
- Cellular and Molecular Research Center, BUMS, Birjand, Iran
| | | | - Mohammad Lotfollahi
- Computational Health Center, Helmholtz Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Neusha Shamsaki
- Birjand University of Medical Sciences (BUMS), Birjand, Iran
| | - Mohammad GhasemiGol
- College of Engineering & Mines, University of North Dakota, North Dakota, USA
| | - Edris HosseiniGol
- Department of Computer Engineering, University of Birjand, Birjand, Iran
| | | | | | - Ahmad Reza Sebzari
- Radiation Oncology, Clinical Research Development Unit (CRDU), ValiAsr Hospital, BUMS, Birjand, Iran
| | - Samira Saghafi
- Birjand University of Medical Sciences (BUMS), Birjand, Iran
| | - Patrizia Leone
- Department of Biomedical Sciences and Human Oncology, University of Bari “Aldo Moro”, Bari, Italy
| | - Nicola Silvestris
- Medical Oncology Unit, Department of Human Pathology “G. Barresi”, University of Messina, Messina, Italy
| | - Hossein Safarpour
- Cellular and Molecular Research Center, BUMS, Birjand, Iran
- Corresponding author.
| | - Vito Racanelli
- Department of Biomedical Sciences and Human Oncology, University of Bari “Aldo Moro”, Bari, Italy
- Corresponding author.
| |
Collapse
|
9
|
Lotfollahi M, Rybakov S, Hrovatin K, Hediyeh-Zadeh S, Talavera-López C, Misharin AV, Theis FJ. Biologically informed deep learning to query gene programs in single-cell atlases. Nat Cell Biol 2023; 25:337-350. [PMID: 36732632 PMCID: PMC9928587 DOI: 10.1038/s41556-022-01072-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 12/08/2022] [Indexed: 02/04/2023]
Abstract
The increasing availability of large-scale single-cell atlases has enabled the detailed description of cell states. In parallel, advances in deep learning allow rapid analysis of newly generated query datasets by mapping them into reference atlases. However, existing data transformations learned to map query data are not easily explainable using biologically known concepts such as genes or pathways. Here we propose expiMap, a biologically informed deep-learning architecture that enables single-cell reference mapping. ExpiMap learns to map cells into biologically understandable components representing known 'gene programs'. The activity of each cell for a gene program is learned while simultaneously refining them and learning de novo programs. We show that expiMap compares favourably to existing methods while bringing an additional layer of interpretability to integrative single-cell analysis. Furthermore, we demonstrate its applicability to analyse single-cell perturbation responses in different tissues and species and resolve responses of patients who have coronavirus disease 2019 to different treatments across cell types.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Sergei Rybakov
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Karin Hrovatin
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Soroor Hediyeh-Zadeh
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Bioinformatics Division, WEHI, Melbourne, Victoria, Australia
| | - Carlos Talavera-López
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Infectious Diseases and Tropical Medicine, Ludwig-Maximilian-Universität Klinikum, Munich, Germany
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Wellcome Sanger Institute, Cambridge, UK.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
10
|
Gayoso A, Lopez R, Xing G, Boyeau P, Valiollah Pour Amiri V, Hong J, Wu K, Jayasuriya M, Mehlman E, Langevin M, Liu Y, Samaran J, Misrachi G, Nazaret A, Clivio O, Xu C, Ashuach T, Gabitto M, Lotfollahi M, Svensson V, da Veiga Beltrame E, Kleshchevnikov V, Talavera-López C, Pachter L, Theis FJ, Streets A, Jordan MI, Regier J, Yosef N. A Python library for probabilistic analysis of single-cell omics data. Nat Biotechnol 2022; 40:163-166. [DOI: 10.1038/s41587-021-01206-w] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
11
|
Lotfollahi M, Naghipourfar M, Luecken MD, Khajavi M, Büttner M, Wagenstetter M, Avsec Ž, Gayoso A, Yosef N, Interlandi M, Rybakov S, Misharin AV, Theis FJ. Mapping single-cell data to reference atlases by transfer learning. Nat Biotechnol 2022; 40:121-130. [PMID: 34462589 PMCID: PMC8763644 DOI: 10.1038/s41587-021-01001-7] [Citation(s) in RCA: 151] [Impact Index Per Article: 75.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 06/28/2021] [Indexed: 02/07/2023]
Abstract
Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Mohsen Naghipourfar
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Malte D Luecken
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Matin Khajavi
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Maren Büttner
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Marco Wagenstetter
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Žiga Avsec
- Department of Computer Science, Technical University of Munich, Munich, Germany
| | - Adam Gayoso
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | - Marta Interlandi
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sergei Rybakov
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fabian J Theis
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
| |
Collapse
|
12
|
Ji Y, Lotfollahi M, Wolf FA, Theis FJ. Machine learning for perturbational single-cell omics. Cell Syst 2021; 12:522-537. [PMID: 34139164 DOI: 10.1016/j.cels.2021.05.016] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 05/04/2021] [Accepted: 05/19/2021] [Indexed: 12/18/2022]
Abstract
Cell biology is fundamentally limited in its ability to collect complete data on cellular phenotypes and the wide range of responses to perturbation. Areas such as computer vision and speech recognition have addressed this problem of characterizing unseen or unlabeled conditions with the combined advances of big data, deep learning, and computing resources in the past 5 years. Similarly, recent advances in machine learning approaches enabled by single-cell data start to address prediction tasks in perturbation response modeling. We first define objectives in learning perturbation response in single-cell omics; survey existing approaches, resources, and datasets (https://github.com/theislab/sc-pert); and discuss how a perturbation atlas can enable deep learning models to construct an informative perturbation latent space. We then examine future avenues toward more powerful and explainable modeling using deep neural networks, which enable the integration of disparate information sources and an understanding of heterogeneous, complex, and unseen systems.
Collapse
Affiliation(s)
- Yuge Ji
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - F Alexander Wolf
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Cellarity, Cambridge, MA, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Department of Mathematics, Technical University of Munich, Munich, Germany; Cellarity, Cambridge, MA, USA.
| |
Collapse
|
13
|
Abstract
Abstract
Motivation
While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation.
Results
We overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%.
Availability and implementation
The trVAE implementation is available via github.com/theislab/trvae. The results of this article can be reproduced via github.com/theislab/trvae_reproducibility.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Mohsen Naghipourfar
- Institute of Computational Biology, Helmholtz Center Munich, Neuherberg, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
- Department of Mathematics, Technische Universität München, Munich, Germany
| | - F Alexander Wolf
- Institute of Computational Biology, Helmholtz Center Munich, Neuherberg, Germany
| |
Collapse
|
14
|
Ebrahimzadeh S, Barzi M, Lotfollahi M, Tabatabaei SN, Sarikhani S. Attosecond pulse generation from $H_2^ + $H2+ ions using a multicolor beam superposition method. Opt Lett 2020; 45:923-926. [PMID: 32058507 DOI: 10.1364/ol.378494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 11/11/2019] [Indexed: 06/10/2023]
Abstract
The behavior of the high-order harmonics and output attosecond pulses from hydrogen molecule ions with various internuclear distances that are exposed to high intensity incoming pulses are investigated. The incoming pulses that are spectrally wide yield from a superposition of monochromatic beams with a constant frequency distance. Our simulations show that the most intense and shortest attosecond pulses can result from hydrogen molecular ions with large internuclear distances which are exposed to irradiation of intense pulses with a frequency width greater than 0.03 a.u.
Collapse
|
15
|
Abstract
Accurately modeling cellular response to perturbations is a central goal of computational biology. While such modeling has been based on statistical, mechanistic and machine learning models in specific settings, no generalization of predictions to phenomena absent from training data (out-of-sample) has yet been demonstrated. Here, we present scGen (https://github.com/theislab/scgen), a model combining variational autoencoders and latent space vector arithmetics for high-dimensional single-cell gene expression data. We show that scGen accurately models perturbation and infection response of cells across cell types, studies and species. In particular, we demonstrate that scGen learns cell-type and species-specific responses implying that it captures features that distinguish responding from non-responding genes and cells. With the upcoming availability of large-scale atlases of organs in a healthy state, we envision scGen to become a tool for experimental design through in silico screening of perturbation response in the context of disease and drug treatment.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.,School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - F Alexander Wolf
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
| | - Fabian J Theis
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany. .,School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany. .,Department of Mathematics, Technical University of Munich, Munich, Germany.
| |
Collapse
|
16
|
Lotfollahi M, Jafari Siavoshani M, Shirali Hossein Zade R, Saberian M. Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft comput 2019. [DOI: 10.1007/s00500-019-04030-2] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
17
|
Lotfollahi M, Alston AM, McDonald GK. Effect of nitrogen fertiliser placement on grain protein concentration of wheat under different water regimes. ACTA ACUST UNITED AC 1997. [DOI: 10.1071/a96066] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Two experiments were conducted in pots 105 cm deep and 11 cm in diameter to
determine the effects of subsoil nitrogen (N) on grain yield and grain protein
concentration (GPC) of wheat (Triticum aestivum L. cv.
Molineux). In both experiments, KNO3 was applied in
solution at different times and depths in the profile. In the first
experiment, in which a sandy soil low in available N was used, application of
150 mg N at 60 cm, 2 weeks after anthesis, significantly increased grain yield
and GPC. The N was taken up gradually by the plant after N was applied. Adding
N to the subsoil increased root growth and this resulted in increased water
use and water use efficiency. Although there was an increase in the rate of N
uptake by the roots, the main factor that influenced the utilisation of
subsoil N was the root length density. In the second experiment, the effects
of depth and time of N application, and of a reduction in post-anthesis water
supply, were determined. A more fertile soil was used than the one in the
first experiment. There were 5 KNO3 treatments: nil N;
150 mg N applied to the topsoil at sowing; 75 mg N to the topsoil and 75 mg N
to the subsoil (60 cm depth) at sowing; 150 mg N to the subsoil at sowing; 75
mg N to the topsoil at sowing and 75 mg N to the subsoil 1 week after
anthesis. The effect of post-anthesis water stress was assessed by allowing
the topsoil to dry and then supplying half the amount of water used by the
well-watered control treatment at 60 cm in half of the pots. Adding N
increased yield and GPC but there was no significant difference in yield and
GPC between the different N treatments. When N was applied to the topsoil
only, most of it was used by the wheat plants or leached to the subsoil by
anthesis; post-anthesis uptake of N depended on the amount of N in the
subsoil. Adding N, irrespective of the depth of placement or time of
application, increased water use and water use efficiency. In both
experiments, increasing the availability of N in the soil after anthesis
reduced the amount of N that was remobilised from the roots and stem to the
grain. The recovery of applied N in both experiments was high (about
80%). These experiments have shown that N available in the subsoil
after anthesis can be used very efficiently and can contribute to both grain
yield and GPC. A critical factor in the efficient use of this N appears to be
root length density in the subsoil.
Collapse
|