1
|
Ye F, Wang J, Li J, Mei Y, Guo G. Mapping Cell Atlases at the Single-Cell Level. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305449. [PMID: 38145338 PMCID: PMC10885669 DOI: 10.1002/advs.202305449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/01/2023] [Indexed: 12/26/2023]
Abstract
Recent advancements in single-cell technologies have led to rapid developments in the construction of cell atlases. These atlases have the potential to provide detailed information about every cell type in different organisms, enabling the characterization of cellular diversity at the single-cell level. Global efforts in developing comprehensive cell atlases have profound implications for both basic research and clinical applications. This review provides a broad overview of the cellular diversity and dynamics across various biological systems. In addition, the incorporation of machine learning techniques into cell atlas analyses opens up exciting prospects for the field of integrative biology.
Collapse
Affiliation(s)
- Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
| | - Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
| | - Jiaqi Li
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
| | - Yuqing Mei
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative MedicineZhejiang University School of MedicineHangzhouZhejiang310000China
- Liangzhu LaboratoryZhejiang UniversityHangzhouZhejiang311121China
- Zhejiang Provincial Key Lab for Tissue Engineering and Regenerative MedicineDr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative MedicineHangzhouZhejiang310058China
- Institute of HematologyZhejiang UniversityHangzhouZhejiang310000China
| |
Collapse
|
2
|
Bilous M, Tran L, Cianciaruso C, Gabriel A, Michel H, Carmona SJ, Pittet MJ, Gfeller D. Metacells untangle large and complex single-cell transcriptome networks. BMC Bioinformatics 2022; 23:336. [PMID: 35963997 PMCID: PMC9375201 DOI: 10.1186/s12859-022-04861-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 07/23/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) technologies offer unique opportunities for exploring heterogeneous cell populations. However, in-depth single-cell transcriptomic characterization of complex tissues often requires profiling tens to hundreds of thousands of cells. Such large numbers of cells represent an important hurdle for downstream analyses, interpretation and visualization. RESULTS We develop a framework called SuperCell to merge highly similar cells into metacells and perform standard scRNA-seq data analyses at the metacell level. Our systematic benchmarking demonstrates that metacells not only preserve but often improve the results of downstream analyses including visualization, clustering, differential expression, cell type annotation, gene correlation, imputation, RNA velocity and data integration. By capitalizing on the redundancy inherent to scRNA-seq data, metacells significantly facilitate and accelerate the construction and interpretation of single-cell atlases, as demonstrated by the integration of 1.46 million cells from COVID-19 patients in less than two hours on a standard desktop. CONCLUSIONS SuperCell is a framework to build and analyze metacells in a way that efficiently preserves the results of scRNA-seq data analyses while significantly accelerating and facilitating them.
Collapse
Affiliation(s)
- Mariia Bilous
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Loc Tran
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Chiara Cianciaruso
- Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland
| | - Aurélie Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Hugo Michel
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
| | - Santiago J Carmona
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Mikael J Pittet
- Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland
- Department of Oncology, Geneva University Hospitals, Geneva, Switzerland
- Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
3
|
Miller BF, Huang F, Atta L, Sahoo A, Fan J. Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data. Nat Commun 2022; 13:2339. [PMID: 35487922 PMCID: PMC9055051 DOI: 10.1038/s41467-022-30033-z] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 04/12/2022] [Indexed: 12/12/2022] Open
Abstract
Recent technological advancements have enabled spatially resolved transcriptomic profiling but at multi-cellular pixel resolution, thereby hindering the identification of cell-type-specific spatial patterns and gene expression variation. To address this challenge, we develop STdeconvolve as a reference-free approach to deconvolve underlying cell types comprising such multi-cellular pixel resolution spatial transcriptomics (ST) datasets. Using simulated as well as real ST datasets from diverse spatial transcriptomics technologies comprising a variety of spatial resolutions such as Spatial Transcriptomics, 10X Visium, DBiT-seq, and Slide-seq, we show that STdeconvolve can effectively recover cell-type transcriptional profiles and their proportional representation within pixels without reliance on external single-cell transcriptomics references. STdeconvolve provides comparable performance to existing reference-based methods when suitable single-cell references are available, as well as potentially superior performance when suitable single-cell references are not available. STdeconvolve is available as an open-source R software package with the source code available at https://github.com/JEFworks-Lab/STdeconvolve. Identifying cell-type-specific spatial patterns in ST data is critical for understanding tissue organization but current methods rely on external references. Here the authors develop a reference-free method to effectively recover cell-type transcriptional profiles and proportions.
Collapse
Affiliation(s)
- Brendan F Miller
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21211, United States.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Feiyang Huang
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21211, United States.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Lyla Atta
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21211, United States.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Arpan Sahoo
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21211, United States.,Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, United States
| | - Jean Fan
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, 21211, United States. .,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, United States. .,Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, United States.
| |
Collapse
|
4
|
Abstract
Cell atlases are essential companions to the genome as they elucidate how genes are used in a cell type-specific manner or how the usage of genes changes over the lifetime of an organism. This review explores recent advances in whole-organism single-cell atlases, which enable understanding of cell heterogeneity and tissue and cell fate, both in health and disease. Here we provide an overview of recent efforts to build cell atlases across species and discuss the challenges that the field is currently facing. Moreover, we propose the concept of having a knowledgebase that can scale with the number of experiments and computational approaches and a new feedback loop for development and benchmarking of computational methods that includes contributions from the users. These two aspects are key for community efforts in single-cell biology that will help produce a comprehensive annotated map of cell types and states with unparalleled resolution.
Collapse
Affiliation(s)
| | - Bruno Tojo
- Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Aaron McGeever
- Chan Zuckerberg Biohub, San Francisco, California 94103, USA;
| |
Collapse
|
5
|
Chen Y, Chen S, Zhang X. Using DenseFly algorithm for cell searching on massive scRNA-seq datasets. BMC Genomics 2020; 21:222. [PMID: 33327944 PMCID: PMC7739457 DOI: 10.1186/s12864-020-6651-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 03/04/2020] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND High throughput single-cell transcriptomic technology produces massive high-dimensional data, enabling high-resolution cell type definition and identification. To uncover the expressional patterns beneath the big data, a transcriptional landscape searching algorithm at a single-cell level is desirable. RESULTS We explored the feasibility of using DenseFly algorithm for cell searching on scRNA-seq data. DenseFly is a locality sensitive hashing algorithm inspired by the fruit fly olfactory system. The experiments indicate that DenseFly outperforms the baseline methods FlyHash and SimHash in classification tasks, and the performance is robust to dropout events and batch effects. CONCLUSION We developed a method for mapping cells across scRNA-seq datasets based on the DenseFly algorithm. It can be an efficient tool for cell atlas searching.
Collapse
Affiliation(s)
- Yixin Chen
- Department of Automation, MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Sijie Chen
- Department of Automation, MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Xuegong Zhang
- Department of Automation, MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Tsinghua University, Beijing, 100084, China.
- School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
6
|
Fan J, Slowikowski K, Zhang F. Single-cell transcriptomics in cancer: computational challenges and opportunities. Exp Mol Med 2020; 52:1452-1465. [PMID: 32929226 PMCID: PMC8080633 DOI: 10.1038/s12276-020-0422-0] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 02/26/2020] [Accepted: 03/10/2020] [Indexed: 02/07/2023] Open
Abstract
Intratumor heterogeneity is a common characteristic across diverse cancer types and presents challenges to current standards of treatment. Advancements in high-throughput sequencing and imaging technologies provide opportunities to identify and characterize these aspects of heterogeneity. Notably, transcriptomic profiling at a single-cell resolution enables quantitative measurements of the molecular activity that underlies the phenotypic diversity of cells within a tumor. Such high-dimensional data require computational analysis to extract relevant biological insights about the cell types and states that drive cancer development, pathogenesis, and clinical outcomes. In this review, we highlight emerging themes in the computational analysis of single-cell transcriptomics data and their applications to cancer research. We focus on downstream analytical challenges relevant to cancer research, including how to computationally perform unified analysis across many patients and disease states, distinguish neoplastic from nonneoplastic cells, infer communication with the tumor microenvironment, and delineate tumoral and microenvironmental evolution with trajectory and RNA velocity analysis. We include discussions of challenges and opportunities for future computational methodological advancements necessary to realize the translational potential of single-cell transcriptomic profiling in cancer.
Collapse
Affiliation(s)
- Jean Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA.
| | - Kamil Slowikowski
- Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Charlestown, MA, USA
| | - Fan Zhang
- Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 675] [Impact Index Per Article: 135.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open
Abstract
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Collapse
Affiliation(s)
- David Lähnemann
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Ewa Szczurek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Davis J. McCarthy
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia
- Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
| | - Catalina A. Vallejos
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
- The Alan Turing Institute, British Library, London, UK
| | - Kieran R. Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Data Science Institute, University of British Columbia, Vancouver, Canada
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmed Mahfouz
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Jasmijn Baaijens
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | - Marleen Balvert
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Buys de Barbanson
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Antonio Cappuccio
- Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
| | - Giacomo Corleone
- Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria Florescu
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rens Holmer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Thamar Jessurun Lobo
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Emma M. Keizer
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Indu Khatri
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Szymon M. Kielbasa
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alexey M. Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Boudewijn P.F. Lelieveldt
- PRB lab, Delft University of Technology, Delft, The Netherlands
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ion I. Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, USA
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Felix Mölder
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Amir Niknejad
- Computation molecular design, Zuse Institute Berlin, Berlin, Germany
- Mathematics Department, Mount Saint Vincent, New York, USA
| | - Alicja Rączkowska
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Marcel Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
| | - Antonios Somarakis
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Oliver Stegle
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
| | - Huan Yang
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Alice C. McHardy
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
8
|
Melamud E, Taylor DL, Sethi A, Cule M, Baryshnikova A, Saleheen D, van Bruggen N, FitzGerald GA. The promise and reality of therapeutic discovery from large cohorts. J Clin Invest 2020; 130:575-581. [PMID: 31929188 PMCID: PMC6994121 DOI: 10.1172/jci129196] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Technological advances in rapid data acquisition have transformed medical biology into a data mining field, where new data sets are routinely dissected and analyzed by statistical models of ever-increasing complexity. Many hypotheses can be generated and tested within a single large data set, and even small effects can be statistically discriminated from a sea of noise. On the other hand, the development of therapeutic interventions moves at a much slower pace. They are determined from carefully randomized and well-controlled experiments with explicitly stated outcomes as the principal mechanism by which a single hypothesis is tested. In this paradigm, only a small fraction of interventions can be tested, and an even smaller fraction are ultimately deemed therapeutically successful. In this Review, we propose strategies to leverage large-cohort data to inform the selection of targets and the design of randomized trials of novel therapeutics. Ultimately, the incorporation of big data and experimental medicine approaches should aim to reduce the failure rate of clinical trials as well as expedite and lower the cost of drug development.
Collapse
Affiliation(s)
- Eugene Melamud
- Calico Life Sciences LLC, South San Francisco, California, USA
| | | | - Anurag Sethi
- Calico Life Sciences LLC, South San Francisco, California, USA
| | - Madeleine Cule
- Calico Life Sciences LLC, South San Francisco, California, USA
| | | | | | | | - Garret A. FitzGerald
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
9
|
Raredon MSB, Adams TS, Suhail Y, Schupp JC, Poli S, Neumark N, Leiby KL, Greaney AM, Yuan Y, Horien C, Linderman G, Engler AJ, Boffa DJ, Kluger Y, Rosas IO, Levchenko A, Kaminski N, Niklason LE. Single-cell connectomic analysis of adult mammalian lungs. SCIENCE ADVANCES 2019; 5:eaaw3851. [PMID: 31840053 PMCID: PMC6892628 DOI: 10.1126/sciadv.aaw3851] [Citation(s) in RCA: 147] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 09/18/2019] [Indexed: 05/17/2023]
Abstract
Efforts to decipher chronic lung disease and to reconstitute functional lung tissue through regenerative medicine have been hampered by an incomplete understanding of cell-cell interactions governing tissue homeostasis. Because the structure of mammalian lungs is highly conserved at the histologic level, we hypothesized that there are evolutionarily conserved homeostatic mechanisms that keep the fine architecture of the lung in balance. We have leveraged single-cell RNA sequencing techniques to identify conserved patterns of cell-cell cross-talk in adult mammalian lungs, analyzing mouse, rat, pig, and human pulmonary tissues. Specific stereotyped functional roles for each cell type in the distal lung are observed, with alveolar type I cells having a major role in the regulation of tissue homeostasis. This paper provides a systems-level portrait of signaling between alveolar cell populations. These methods may be applicable to other organs, providing a roadmap for identifying key pathways governing pathophysiology and informing regenerative efforts.
Collapse
Affiliation(s)
- Micha Sam Brickman Raredon
- Department of Biomedical Engineering, Yale University, New Haven, CT 06511, USA
- Vascular Biology and Therapeutics, Yale University, New Haven, CT 06520, USA
- Medical Scientist Training Program, Yale School of Medicine, New Haven, CT 06510, USA
| | - Taylor Sterling Adams
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale University, New Haven, CT 06520, USA
| | - Yasir Suhail
- Yale Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Jonas Christian Schupp
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale University, New Haven, CT 06520, USA
| | - Sergio Poli
- Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Nir Neumark
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale University, New Haven, CT 06520, USA
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Katherine L. Leiby
- Department of Biomedical Engineering, Yale University, New Haven, CT 06511, USA
- Vascular Biology and Therapeutics, Yale University, New Haven, CT 06520, USA
- Medical Scientist Training Program, Yale School of Medicine, New Haven, CT 06510, USA
| | - Allison Marie Greaney
- Department of Biomedical Engineering, Yale University, New Haven, CT 06511, USA
- Vascular Biology and Therapeutics, Yale University, New Haven, CT 06520, USA
| | - Yifan Yuan
- Department of Anesthesiology, Yale University, New Haven, CT 06510, USA
| | - Corey Horien
- Medical Scientist Training Program, Yale School of Medicine, New Haven, CT 06510, USA
- Interdepartmental Neuroscience Program, Yale University, New Haven, CT 06510, USA
| | - George Linderman
- Medical Scientist Training Program, Yale School of Medicine, New Haven, CT 06510, USA
- Applied Mathematics Program, Yale University, New Haven, CT 06511, USA
| | - Alexander J. Engler
- Department of Biomedical Engineering, Yale University, New Haven, CT 06511, USA
- Vascular Biology and Therapeutics, Yale University, New Haven, CT 06520, USA
| | - Daniel J. Boffa
- Thoracic Surgery, Yale School of Medicine, New Haven, CT 06510, USA
| | - Yuval Kluger
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Applied Mathematics Program, Yale University, New Haven, CT 06511, USA
- Department of Pathology, Yale University, New Haven, CT 06520, USA
| | - Ivan O. Rosas
- Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Andre Levchenko
- Department of Biomedical Engineering, Yale University, New Haven, CT 06511, USA
- Yale Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale University, New Haven, CT 06520, USA
| | - Laura E. Niklason
- Department of Biomedical Engineering, Yale University, New Haven, CT 06511, USA
- Vascular Biology and Therapeutics, Yale University, New Haven, CT 06520, USA
- Department of Anesthesiology, Yale University, New Haven, CT 06510, USA
| |
Collapse
|
10
|
Günther P, Schultze JL. Mind the Map: Technology Shapes the Myeloid Cell Space. Front Immunol 2019; 10:2287. [PMID: 31636632 PMCID: PMC6787770 DOI: 10.3389/fimmu.2019.02287] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 09/10/2019] [Indexed: 12/14/2022] Open
Abstract
The myeloid cell system shows very high plasticity, which is crucial to quickly adapt to changes during an immune response. From the beginning, this high plasticity has made cell type classification within the myeloid cell system difficult. Not surprising, naming schemes have been frequently changed. Recent advancements in multidimensional technologies, including mass cytometry and single-cell RNA sequencing, are challenging our current understanding of cell types, cell subsets, and functional states of cells. Despite the power of these technologies to create new reference maps for the myeloid cell system, it is essential to put these new results into context with previous knowledge that was established over decades. Here we report on earlier attempts of cell type classification in the myeloid cell system, discuss current approaches and their pros and cons, and propose future strategies for cell type classification within the myeloid cell system that can be easily extended to other cell types.
Collapse
Affiliation(s)
- Patrick Günther
- Genomics and Immunoregulation, Life and Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany.,Platform for Single Cell Genomics and Epigenomics, German Center for Neurodegenerative Diseases and University of Bonn, Bonn, Germany
| | - Joachim L Schultze
- Genomics and Immunoregulation, Life and Medical Sciences Institute (LIMES), University of Bonn, Bonn, Germany.,Platform for Single Cell Genomics and Epigenomics, German Center for Neurodegenerative Diseases and University of Bonn, Bonn, Germany
| |
Collapse
|
11
|
Abstract
Background: Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells. A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. Methods: We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. Results: The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. Conclusions: Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.
Collapse
Affiliation(s)
| | - Martin Hemberg
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| |
Collapse
|
12
|
Landegren U, Al-Amin RA, Björkesten J. A myopic perspective on the future of protein diagnostics. N Biotechnol 2018; 45:14-18. [DOI: 10.1016/j.nbt.2018.01.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 01/02/2018] [Accepted: 01/04/2018] [Indexed: 01/09/2023]
|
13
|
Farack L, Egozi A, Itzkovitz S. Single molecule approaches for studying gene regulation in metabolic tissues. Diabetes Obes Metab 2018; 20 Suppl 2:145-156. [PMID: 30230176 DOI: 10.1111/dom.13390] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 05/16/2018] [Accepted: 05/30/2018] [Indexed: 12/25/2022]
Abstract
Gene expression in metabolic tissues can be regulated at multiple levels, ranging from the control of promoter accessibilities, transcription rates, mRNA degradation rates and mRNA localization. Modulating these processes can differentially affect important performance criteria of cells. These include precision, cellular economy, rapid response and maintenance of DNA integrity. In this review we will describe how distinct strategies of gene regulation impact the trade-offs between the cells' performance criteria. We will highlight tools based on single molecule visualization of transcripts that can be used to measure promoter states, transcription rates and mRNA degradation rates in intact tissues. These approaches revealed surprising recurrent patterns in mammalian tissues, that include transcriptional bursting, nuclear retention of mRNA, and coordination of mRNA lifetimes to facilitate rapid adaptation to changing metabolic inputs. The ability to characterize gene expression at the single molecule level can uncover the design principles of gene regulation in metabolic tissues such as the liver and the pancreas.
Collapse
Affiliation(s)
- Lydia Farack
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Adi Egozi
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Shalev Itzkovitz
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
14
|
Sinitcyn P, Rudolph JD, Cox J. Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013516] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Computational proteomics is the data science concerned with the identification and quantification of proteins from high-throughput data and the biological interpretation of their concentration changes, posttranslational modifications, interactions, and subcellular localizations. Today, these data most often originate from mass spectrometry–based shotgun proteomics experiments. In this review, we survey computational methods for the analysis of such proteomics data, focusing on the explanation of the key concepts. Starting with mass spectrometric feature detection, we then cover methods for the identification of peptides. Subsequently, protein inference and the control of false discovery rates are highly important topics covered. We then discuss methods for the quantification of peptides and proteins. A section on downstream data analysis covers exploratory statistics, network analysis, machine learning, and multiomics data integration. Finally, we discuss current developments and provide an outlook on what the near future of computational proteomics might bear.
Collapse
Affiliation(s)
- Pavel Sinitcyn
- Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Jan Daniel Rudolph
- Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Jürgen Cox
- Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| |
Collapse
|
15
|
Chen X, Teichmann SA, Meyer KB. From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013452] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.
Collapse
Affiliation(s)
- Xi Chen
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Sarah A. Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- European Molecular Biology Laboratory (EMBL)–European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Theory of Condensed Matter Research Group, Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, United Kingdom
| | - Kerstin B. Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| |
Collapse
|
16
|
Finotello F, Trajanoski Z. Quantifying tumor-infiltrating immune cells from transcriptomics data. Cancer Immunol Immunother 2018; 67:1031-1040. [PMID: 29541787 PMCID: PMC6006237 DOI: 10.1007/s00262-018-2150-z] [Citation(s) in RCA: 277] [Impact Index Per Article: 39.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 03/09/2018] [Indexed: 12/22/2022]
Abstract
By exerting pro- and anti-tumorigenic actions, tumor-infiltrating immune cells can profoundly influence tumor progression, as well as the success of anti-cancer therapies. Therefore, the quantification of tumor-infiltrating immune cells holds the promise to unveil the multi-faceted role of the immune system in human cancers and its involvement in tumor escape mechanisms and response to therapy. Tumor-infiltrating immune cells can be quantified from RNA sequencing data of human tumors using bioinformatics approaches. In this review, we describe state-of-the-art computational methods for the quantification of immune cells from transcriptomics data and discuss the open challenges that must be addressed to accurately quantify immune infiltrates from RNA sequencing data of human bulk tumors.
Collapse
Affiliation(s)
- Francesca Finotello
- Biocenter, Division for Bioinformatics, Medical University of Innsbruck, Innrain 80, 6020, Innsbruck, Austria.
| | - Zlatko Trajanoski
- Biocenter, Division for Bioinformatics, Medical University of Innsbruck, Innrain 80, 6020, Innsbruck, Austria.
| |
Collapse
|
17
|
Zappia L, Phipson B, Oshlack A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput Biol 2018; 14:e1006245. [PMID: 29939984 PMCID: PMC6034903 DOI: 10.1371/journal.pcbi.1006245] [Citation(s) in RCA: 187] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 07/06/2018] [Accepted: 05/30/2018] [Indexed: 01/19/2023] Open
Abstract
As single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source and open-science approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records the growth of the field over time.
Collapse
Affiliation(s)
- Luke Zappia
- Bioinformatics, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
- School of Biosciences, Faculty of Science, University of Melbourne, Melbourne, Victoria, Australia
| | - Belinda Phipson
- Bioinformatics, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
| | - Alicia Oshlack
- Bioinformatics, Murdoch Children’s Research Institute, Melbourne, Victoria, Australia
- School of Biosciences, Faculty of Science, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
18
|
Chemical Processing of Brain Tissues for Large-Volume, High-Resolution Optical Imaging. ACTA ACUST UNITED AC 2018. [DOI: 10.1007/978-981-10-9020-2_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
19
|
Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 2018; 28:739-750. [PMID: 29588361 PMCID: PMC5932613 DOI: 10.1101/gr.227819.117] [Citation(s) in RCA: 280] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 03/23/2018] [Indexed: 01/10/2023]
Abstract
Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.
Collapse
Affiliation(s)
| | - Yakir A Reshef
- Department of Computer Science, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | | - Jasper Snoek
- Google Brain, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
20
|
Zhou JX, Cisneros L, Knijnenburg T, Trachana K, Davies P, Huang S. Phylostratigraphic analysis of tumor and developmental transcriptomes reveals relationship between oncogenesis, phylogenesis and ontogenesis. CONVERGENT SCIENCE PHYSICAL ONCOLOGY 2018. [DOI: 10.1088/2057-1739/aab1b0] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
21
|
Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat Commun 2018; 9:884. [PMID: 29491377 PMCID: PMC5830442 DOI: 10.1038/s41467-018-03282-0] [Citation(s) in RCA: 194] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 02/02/2018] [Indexed: 12/19/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) technology provides a new avenue to discover and characterize cell types; however, the experiment-specific technical biases and analytic variability inherent to current pipelines may undermine its replicability. Meta-analysis is further hampered by the use of ad hoc naming conventions. Here we demonstrate our replication framework, MetaNeighbor, that quantifies the degree to which cell types replicate across datasets, and enables rapid identification of clusters with high similarity. We first measure the replicability of neuronal identity, comparing results across eight technically and biologically diverse datasets to define best practices for more complex assessments. We then apply this to novel interneuron subtypes, finding that 24/45 subtypes have evidence of replication, which enables the identification of robust candidate marker genes. Across tasks we find that large sets of variably expressed genes can identify replicable cell types with high accuracy, suggesting a general route forward for large-scale evaluation of scRNA-seq data.
Collapse
Affiliation(s)
- Megan Crow
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Anirban Paul
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Sara Ballouz
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Z Josh Huang
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
22
|
Lein E, Borm LE, Linnarsson S. The promise of spatial transcriptomics for neuroscience in the era of molecular cell typing. Science 2018; 358:64-69. [PMID: 28983044 DOI: 10.1126/science.aan6827] [Citation(s) in RCA: 257] [Impact Index Per Article: 36.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The stereotyped spatial architecture of the brain is both beautiful and fundamentally related to its function, extending from gross morphology to individual neuron types, where soma position, dendritic architecture, and axonal projections determine their roles in functional circuitry. Our understanding of the cell types that make up the brain is rapidly accelerating, driven in particular by recent advances in single-cell transcriptomics. However, understanding brain function, development, and disease will require linking molecular cell types to morphological, physiological, and behavioral correlates. Emerging spatially resolved transcriptomic methods promise to fill this gap by localizing molecularly defined cell types in tissues, with simultaneous detection of morphology, activity, or connectivity. Here, we review the requirements for spatial transcriptomic methods toward these goals, consider the challenges ahead, and describe promising applications.
Collapse
Affiliation(s)
- Ed Lein
- Allen Institute for Brain Science, Seattle, WA 98109, USA.
| | - Lars E Borm
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 171 77 Stockholm, Sweden.,Science for Life Laboratory, 171 21 Solna, Sweden
| | - Sten Linnarsson
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 171 77 Stockholm, Sweden. .,Science for Life Laboratory, 171 21 Solna, Sweden
| |
Collapse
|
23
|
Huang X, Liu S, Wu L, Jiang M, Hou Y. High Throughput Single Cell RNA Sequencing, Bioinformatics Analysis and Applications. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1068:33-43. [PMID: 29943294 DOI: 10.1007/978-981-13-0502-3_4] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Single cell sequencing (SCS) can be harnessed to acquire the genomes, transcriptomes and epigenomes from individual cells. Next generation sequencing (NGS) technology is the driving force for single cell sequencing. scRNA-seq requires a lengthy pipeline comprising of single cell sorting, RNA extraction, reverse transcription, amplification, library construction, sequencing and subsequent bioinformatic analysis. Computational algorithms are essential to fulfill many tasks of interest using scRNA-seq data. scRNA-seq has already enabled researchers to revisit long-standing questions in cancer biology, including cancer metastasis, heterogeneity and evolution. Circulating Tumor Cells (CTC) are not only an important mechanism for cancer metastasis, but also provide a possibility to diagnose and monitor cancer in a convenient way independent of surgical resection of the cancer.
Collapse
|
24
|
Sarntivijai S, Diehl AD, He Y. Cells in experimental life sciences - challenges and solution to the rapid evolution of knowledge. BMC Bioinformatics 2017; 18:560. [PMID: 29322916 PMCID: PMC5763506 DOI: 10.1186/s12859-017-1976-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Cell cultures used in biomedical experiments come in the form of both sample biopsy primary cells, and maintainable immortalised cell lineages. The rise of bioinformatics and high-throughput technologies has led us to the requirement of ontology representation of cell types and cell lines. The Cell Ontology (CL) and Cell Line Ontology (CLO) have long been established as reference ontologies in the OBO framework. We have compiled a series of the challenges and the proposals of solutions in this CELLS (Cells in ExperimentaL Life Sciences) thematic series that cover the grounds of standing issues and the directions, which were discussed in the First International Workshop on CELLS at the the International Conference on Biomedical Ontology (ICBO). This workshop focused on the extension of the current CL and CLO to cover a wider set of biological questions and challenges needing semantic infrastructure for information modeling. We discussed data-driven use cases that leverage linkage of CL, CLO and other bio-ontologies. This is an established approach in data-driven ontologies such as the Experimental Factor Ontology (EFO), and the Ontology for Biomedical Investigation (OBI). The First International Workshop on CELLS at the International Conference on Biomedical Ontology has brought together experimental biologists and biomedical ontologists to discuss solutions to organizing and representing the rapidly evolving knowledge gained from experimental cells. The workshop has successfully identified the areas of challenge, and the gap in connecting the two domains of knowledge. The outcome of this workshop yielded practical implementation plans to filled in this gap.This CELLS workshop also provided a venue for panel discussions of innovative solutions as well as challenges in the development and applications of biomedical ontologies to represent and analyze experimental cell data.
Collapse
Affiliation(s)
- Sirarat Sarntivijai
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Alexander D. Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, The State University of New York, Buffalo, New York 14203 USA
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, MI 48109 USA
| |
Collapse
|
25
|
Abstract
BACKGROUND Cell lines and cell types are extensively studied in biomedical research yielding to a significant amount of publications each year. Identifying cell lines and cell types precisely in publications is crucial for science reproducibility and knowledge integration. There are efforts for standardisation of the cell nomenclature based on ontology development to support FAIR principles of the cell knowledge. However, it is important to analyse the usage of cell nomenclature in publications at a large scale for understanding the level of uptake of cell nomenclature in literature by scientists. In this study, we analyse the usage of cell nomenclature, both in Vivo, and in Vitro in biomedical literature by using text mining methods and present our results. RESULTS We identified 59% of the cell type classes in the Cell Ontology and 13% of the cell line classes in the Cell Line Ontology in the literature. Our analysis showed that cell line nomenclature is much more ambiguous compared to the cell type nomenclature. However, trends indicate that standardised nomenclature for cell lines and cell types are being increasingly used in publications by the scientists. CONCLUSIONS Our findings provide an insight to understand how experimental cells are described in publications and may allow for an improved standardisation of cell type and cell line nomenclature as well as can be utilised to develop efficient text mining applications on cell types and cell lines. All data generated in this study is available at https://github.com/shenay/CellNomenclatureStudy.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University Science and Technology, 4700 KAUST, Thuwal, 23955-6900 Saudi Arabia
| | - Sirarat Sarntivijai
- The European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, SD CB10 1 UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University Science and Technology, 4700 KAUST, Thuwal, 23955-6900 Saudi Arabia
| |
Collapse
|
26
|
Ponting CP. Big knowledge from big data in functional genomics. Emerg Top Life Sci 2017; 1:245-248. [PMID: 33525805 PMCID: PMC7288990 DOI: 10.1042/etls20170129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 09/12/2017] [Accepted: 09/12/2017] [Indexed: 02/07/2023]
Abstract
With so much genomics data being produced, it might be wise to pause and consider what purpose this data can or should serve. Some improve annotations, others predict molecular interactions, but few add directly to existing knowledge. This is because sequence annotations do not always implicate function, and molecular interactions are often irrelevant to a cell's or organism's survival or propagation. Merely correlative relationships found in big data fail to provide answers to the Why questions of human biology. Instead, those answers are expected from methods that causally link DNA changes to downstream effects without being confounded by reverse causation. These approaches require the controlled measurement of the consequences of DNA variants, for example, either those introduced in single cells using CRISPR/Cas9 genome editing or that are already present across the human population. Inferred causal relationships between genetic variation and cellular phenotypes or disease show promise to rapidly grow and underpin our knowledge base.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, U.K
| |
Collapse
|
27
|
Rozenblatt-Rosen O, Stubbington MJ, Regev A, Teichmann SA. The Human Cell Atlas: from vision to reality. Nature 2017; 550:451-453. [DOI: 10.1038/550451a] [Citation(s) in RCA: 378] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
28
|
Abstract
New methods for simultaneously quantifying protein and gene expression at the single-cell level have the power to identify cell types and to classify cell populations.
Collapse
Affiliation(s)
- Maayan Baron
- Institute for Computational Medicine, NYU School of Medicine, 430 East 29th St., New York, NY, 10016, USA
| | - Itai Yanai
- Institute for Computational Medicine, NYU School of Medicine, 430 East 29th St., New York, NY, 10016, USA.
| |
Collapse
|
29
|
Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med 2017; 9:75. [PMID: 28821273 PMCID: PMC5561556 DOI: 10.1186/s13073-017-0467-4] [Citation(s) in RCA: 626] [Impact Index Per Article: 78.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
RNA sequencing (RNA-seq) is a genomic approach for the detection and quantitative analysis of messenger RNA molecules in a biological sample and is useful for studying cellular responses. RNA-seq has fueled much discovery and innovation in medicine over recent years. For practical reasons, the technique is usually conducted on samples comprising thousands to millions of cells. However, this has hindered direct assessment of the fundamental unit of biology-the cell. Since the first single-cell RNA-sequencing (scRNA-seq) study was published in 2009, many more have been conducted, mostly by specialist laboratories with unique skills in wet-lab single-cell genomics, bioinformatics, and computation. However, with the increasing commercial availability of scRNA-seq platforms, and the rapid ongoing maturation of bioinformatics approaches, a point has been reached where any biomedical researcher or clinician can use scRNA-seq to make exciting discoveries. In this review, we present a practical guide to help researchers design their first scRNA-seq studies, including introductory information on experimental hardware, protocol choice, quality control, data analysis and biological interpretation.
Collapse
Affiliation(s)
- Ashraful Haque
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, 4006, Australia.
| | - Jessica Engel
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, 4006, Australia
| | - Sarah A Teichmann
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Tapio Lönnberg
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland.
| |
Collapse
|