1
|
Sumanaweera D, Suo C, Cujba AM, Muraro D, Dann E, Polanski K, Steemers AS, Lee W, Oliver AJ, Park JE, Meyer KB, Dumitrascu B, Teichmann SA. Gene-level alignment of single-cell trajectories. Nat Methods 2025; 22:68-81. [PMID: 39300283 PMCID: PMC11725504 DOI: 10.1038/s41592-024-02378-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 07/12/2024] [Indexed: 09/22/2024]
Abstract
Single-cell data analysis can infer dynamic changes in cell populations, for example across time, space or in response to perturbation, thus deriving pseudotime trajectories. Current approaches comparing trajectories often use dynamic programming but are limited by assumptions such as the existence of a definitive match. Here we describe Genes2Genes, a Bayesian information-theoretic dynamic programming framework for aligning single-cell trajectories. It is able to capture sequential matches and mismatches of individual genes between a reference and query trajectory, highlighting distinct clusters of alignment patterns. Across both real world and simulated datasets, it accurately inferred alignments and demonstrated its utility in disease cell-state trajectory analysis. In a proof-of-concept application, Genes2Genes revealed that T cells differentiated in vitro match an immature in vivo state while lacking expression of genes associated with TNF signaling. This demonstrates that precise trajectory alignment can pinpoint divergence from the in vivo system, thus guiding the optimization of in vitro culture conditions.
Collapse
Affiliation(s)
- Dinithi Sumanaweera
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Theory of Condensed Matter, Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, UK
| | - Chenqu Suo
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Paediatrics, Cambridge University Hospitals; Hills Road, Cambridge, UK
| | - Ana-Maria Cujba
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Daniele Muraro
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Emma Dann
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Krzysztof Polanski
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Alexander S Steemers
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Princess Máxima Center for Pediatric Oncology, Utrecht, Netherlands
| | - Woochan Lee
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Biomedical Sciences, Seoul National University, Seoul, Korea
| | - Amanda J Oliver
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Jong-Eun Park
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Kerstin B Meyer
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Bianca Dumitrascu
- Department of Statistics, Columbia University, New York, NY, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY, USA
| | - Sarah A Teichmann
- Wellcome Sanger Institute; Wellcome Genome Campus, Hinxton, Cambridge, UK.
- Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK.
- Department of Medicine, University of Cambridge, Cambridge, UK.
- Co-director of CIFAR Macmillan Research Program, Toronto, Ontario, Canada.
| |
Collapse
|
2
|
Kumasaka N. Genetic association mapping leveraging Gaussian processes. J Hum Genet 2024; 69:505-510. [PMID: 38834722 PMCID: PMC11422164 DOI: 10.1038/s10038-024-01259-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/16/2024] [Accepted: 05/20/2024] [Indexed: 06/06/2024]
Abstract
Gaussian processes (GPs) are a powerful and useful approach for modelling nonlinear phenomena in various scientific fields, including genomics and genetics. This review focuses on the application of GPs in genetic association mapping. The aim is to identify genetic variants that alter gene regulation along continuous cellular states at the molecular level, as well as disease susceptibility over time and space at the population level. The challenges and opportunities in this field are also addressed.
Collapse
Affiliation(s)
- Natsuhiko Kumasaka
- Division of Digital Genomics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
3
|
Lee CY, Clatworthy MR, Withers DR. Decoding changes in tumor-infiltrating leukocytes through dynamic experimental models and single-cell technologies. Immunol Cell Biol 2024; 102:665-679. [PMID: 38853634 DOI: 10.1111/imcb.12787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/13/2024] [Accepted: 05/13/2024] [Indexed: 06/11/2024]
Abstract
The ability to characterize immune cells and explore the molecular interactions that govern their functions has never been greater, fueled in recent years by the revolutionary advance of single-cell analysis platforms. However, precisely how immune cells respond to different stimuli and where differentiation processes and effector functions operate remain incompletely understood. Inferring cellular fate within single-cell transcriptomic analyses is now omnipresent, despite the assumptions typically required in such analyses. Recently developed experimental models support dynamic analyses of the immune response, providing insights into the temporal changes that occur within cells and the tissues in which such transitions occur. Here we will review these approaches and discuss how these can be combined with single-cell technologies to develop a deeper understanding of the immune responses that should support the development of better therapeutic options for patients.
Collapse
Affiliation(s)
- Colin Yc Lee
- Cambridge Institute of Therapeutic Immunology and Infection Disease, University of Cambridge, Cambridge, UK
| | - Menna R Clatworthy
- Cambridge Institute of Therapeutic Immunology and Infection Disease, University of Cambridge, Cambridge, UK
| | - David R Withers
- Institute of Immunology and Immunotherapy, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| |
Collapse
|
4
|
Van Der Byl W, Nüssing S, Peters TJ, Ahn A, Li H, Ledergor G, David E, Koh AS, Wagle MV, Deguit CDT, de Menezes MN, Travers A, Sampurno S, Ramsbottom KM, Li R, Kallies A, Beavis PA, Jungmann R, Bastings MMC, Belz GT, Goel S, Trapani JA, Crabtree GR, Chang HY, Amit I, Goodnow CC, Luciani F, Parish IA. The CD8 + T cell tolerance checkpoint triggers a distinct differentiation state defined by protein translation defects. Immunity 2024; 57:1324-1344.e8. [PMID: 38776918 PMCID: PMC11807353 DOI: 10.1016/j.immuni.2024.04.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 02/01/2024] [Accepted: 04/30/2024] [Indexed: 05/25/2024]
Abstract
Peripheral CD8+ T cell tolerance is a checkpoint in both autoimmune disease and anti-cancer immunity. Despite its importance, the relationship between tolerance-induced states and other CD8+ T cell differentiation states remains unclear. Using flow cytometric phenotyping, single-cell RNA sequencing (scRNA-seq), and chromatin accessibility profiling, we demonstrated that in vivo peripheral tolerance to a self-antigen triggered a fundamentally distinct differentiation state separate from exhaustion, memory, and functional effector cells but analogous to cells defectively primed against tumors. Tolerant cells diverged early and progressively from effector cells, adopting a transcriptionally and epigenetically distinct state within 60 h of antigen encounter. Breaching tolerance required the synergistic actions of strong T cell receptor (TCR) signaling and inflammation, which cooperatively induced gene modules that enhanced protein translation. Weak TCR signaling during bystander infection failed to breach tolerance due to the uncoupling of effector gene expression from protein translation. Thus, tolerance engages a distinct differentiation trajectory enforced by protein translation defects.
Collapse
Affiliation(s)
- Willem Van Der Byl
- The Kirby Institute for Infection and Immunity, UNSW, Sydney, NSW, Australia; School of Medical Sciences, Faculty of Medicine, UNSW, Sydney, NSW, Australia
| | - Simone Nüssing
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Timothy J Peters
- Garvan Institute of Medical Research, Sydney, NSW, Australia; University of New South Wales Sydney, Sydney, NSW, Australia
| | - Antonio Ahn
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Hanjie Li
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Guy Ledergor
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Eyal David
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Andrew S Koh
- Department of Pathology, University of Chicago, Chicago, IL, USA
| | - Mayura V Wagle
- Garvan Institute of Medical Research, Sydney, NSW, Australia; John Curtin School of Medical Research, ANU, Canberra, ACT, Australia
| | | | - Maria N de Menezes
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Avraham Travers
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Shienny Sampurno
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Kelly M Ramsbottom
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Rui Li
- Center for Personal Dynamic Regulomes, Stanford University School of Medicine, Stanford, CA, USA
| | - Axel Kallies
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia; Department of Microbiology and Immunology, The University of Melbourne, Melbourne, VIC, Australia
| | - Paul A Beavis
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Ralf Jungmann
- Faculty of Physics and Center for Nanoscience, Ludwig Maximilian University, Munich, Germany; Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Maartje M C Bastings
- Institute of Materials, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Interfaculty Bioengineering Institute, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Gabrielle T Belz
- The Frazer Institute, The University of Queensland, Brisbane, QLD, Australia; Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Shom Goel
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Joseph A Trapani
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Gerald R Crabtree
- Center for Personal Dynamic Regulomes, Stanford University School of Medicine, Stanford, CA, USA; Departments of Pathology and Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Stanford University School of Medicine, Stanford, CA, USA
| | - Ido Amit
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Chris C Goodnow
- School of Medical Sciences, Faculty of Medicine, UNSW, Sydney, NSW, Australia; Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Fabio Luciani
- The Kirby Institute for Infection and Immunity, UNSW, Sydney, NSW, Australia; School of Medical Sciences, Faculty of Medicine, UNSW, Sydney, NSW, Australia.
| | - Ian A Parish
- Cancer Immunology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia; Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia; John Curtin School of Medical Research, ANU, Canberra, ACT, Australia.
| |
Collapse
|
5
|
Maizels RJ. A dynamical perspective: moving towards mechanism in single-cell transcriptomics. Philos Trans R Soc Lond B Biol Sci 2024; 379:20230049. [PMID: 38432314 PMCID: PMC10909508 DOI: 10.1098/rstb.2023.0049] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/31/2023] [Indexed: 03/05/2024] Open
Abstract
As the field of single-cell transcriptomics matures, research is shifting focus from phenomenological descriptions of cellular phenotypes to a mechanistic understanding of the gene regulation underneath. This perspective considers the value of capturing dynamical information at single-cell resolution for gaining mechanistic insight; reviews the available technologies for recording and inferring temporal information in single cells; and explores whether better dynamical resolution is sufficient to adequately capture the causal relationships driving complex biological systems. This article is part of a discussion meeting issue 'Causes and consequences of stochastic processes in development and disease'.
Collapse
Affiliation(s)
- Rory J. Maizels
- The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
- University College London, London WC1E 6BT, UK
| |
Collapse
|
6
|
Vo HK, Dawes JHP, Kelsh RN. Oscillatory differentiation dynamics fundamentally restricts the resolution of pseudotime reconstruction algorithms. J R Soc Interface 2024; 21:20230537. [PMID: 38503342 PMCID: PMC10950464 DOI: 10.1098/rsif.2023.0537] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 02/20/2024] [Indexed: 03/21/2024] Open
Abstract
The challenge to understand differentiation and cell lineages in development has resulted in many bioinformatics software tools, notably those working with gene expression data obtained via single-cell RNA sequencing obtained at snapshots in time. Reconstruction methods for trajectories often proceed by dimension reduction, data clustering and then computation of a tree graph in which edges indicate closely related clusters. Cell lineages can then be deduced by following paths through the tree. In the case of multi-potent cells undergoing differentiation, this trajectory reconstruction involves the reconstruction of multiple distinct lineages corresponding to commitment to each of a set of distinct fates. Recent work suggests that there may be cases in which the cell differentiation process involves trajectories that explore, in a dynamic and oscillatory fashion, propensity to differentiate into a number of possible cell fates before commitment finally occurs. Here, we show theoretically that the presence of such oscillations provides intrinsic constraints on the quality and resolution of the trajectory reconstruction process, even for idealized noise-free data. These constraints point to inherent common limitations of current methodologies and serve both to provide additional challenge in the development of software tools and also may help to understand features observed in recent experiments.
Collapse
Affiliation(s)
- Huy K. Vo
- Department of Mathematical Sciences, University of Bath, BA2 7AY Bath, UK
| | | | - Robert N. Kelsh
- Department of Life Sciences, University of Bath, BA2 7AY Bath, UK
| |
Collapse
|
7
|
Schuster V, Krogh A. The Deep Generative Decoder: MAP estimation of representations improves modelling of single-cell RNA data. Bioinformatics 2023; 39:btad497. [PMID: 37572301 PMCID: PMC10483129 DOI: 10.1093/bioinformatics/btad497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 07/12/2023] [Accepted: 08/10/2023] [Indexed: 08/14/2023] Open
Abstract
MOTIVATION Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models, such as variational autoencoders, which use a variational approximation of the likelihood for inference. RESULTS We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori estimation. The DGD handles complex parameterized latent distributions naturally unlike variational autoencoders, which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell datasets. Here, the DGD learns low-dimensional, meaningful, and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable variational autoencoder. AVAILABILITY AND IMPLEMENTATION scDGD is available as a python package at https://github.com/Center-for-Health-Data-Science/scDGD. The remaining code is made available here: https://github.com/Center-for-Health-Data-Science/dgd.
Collapse
Affiliation(s)
- Viktoria Schuster
- Center for Health Data Science, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Anders Krogh
- Center for Health Data Science, University of Copenhagen, 2200 Copenhagen, Denmark
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
| |
Collapse
|
8
|
Kang JB, Raveane A, Nathan A, Soranzo N, Raychaudhuri S. Methods and Insights from Single-Cell Expression Quantitative Trait Loci. Annu Rev Genomics Hum Genet 2023; 24:277-303. [PMID: 37196361 PMCID: PMC10784788 DOI: 10.1146/annurev-genom-101422-100437] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Recent advancements in single-cell technologies have enabled expression quantitative trait locus (eQTL) analysis across many individuals at single-cell resolution. Compared with bulk RNA sequencing, which averages gene expression across cell types and cell states, single-cell assays capture the transcriptional states of individual cells, including fine-grained, transient, and difficult-to-isolate populations at unprecedented scale and resolution. Single-cell eQTL (sc-eQTL) mapping can identify context-dependent eQTLs that vary with cell states, including some that colocalize with disease variants identified in genome-wide association studies. By uncovering the precise contexts in which these eQTLs act, single-cell approaches can unveil previously hidden regulatory effects and pinpoint important cell states underlying molecular mechanisms of disease. Here, we present an overview of recently deployed experimental designs in sc-eQTL studies. In the process, we consider the influence of study design choices such as cohort, cell states, and ex vivo perturbations. We then discuss current methodologies, modeling approaches, and technical challenges as well as future opportunities and applications.
Collapse
Affiliation(s)
- Joyce B Kang
- Center for Data Sciences and Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA; ,
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA;
| | | | - Aparna Nathan
- Center for Data Sciences and Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA; ,
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA;
| | - Nicole Soranzo
- Human Technopole, Milan, Italy; ,
- Department of Human Genetics, Wellcome Sanger Institute, Hinxton, United Kingdom
- British Heart Foundation Centre of Research Excellence and Department of Haematology, University of Cambridge, Cambridge, United Kingdom
| | - Soumya Raychaudhuri
- Center for Data Sciences and Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA; ,
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA;
- Centre for Genetics and Genomics Versus Arthritis, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
9
|
Kumasaka N, Rostom R, Huang N, Polanski K, Meyer KB, Patel S, Boyd R, Gomez C, Barnett SN, Panousis NI, Schwartzentruber J, Ghoussaini M, Lyons PA, Calero-Nieto FJ, Göttgens B, Barnes JL, Worlock KB, Yoshida M, Nikolić MZ, Stephenson E, Reynolds G, Haniffa M, Marioni JC, Stegle O, Hagai T, Teichmann SA. Mapping interindividual dynamics of innate immune response at single-cell resolution. Nat Genet 2023; 55:1066-1075. [PMID: 37308670 PMCID: PMC10260404 DOI: 10.1038/s41588-023-01421-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Accepted: 04/27/2023] [Indexed: 06/14/2023]
Abstract
Common genetic variants across individuals modulate the cellular response to pathogens and are implicated in diverse immune pathologies, yet how they dynamically alter the response upon infection is not well understood. Here, we triggered antiviral responses in human fibroblasts from 68 healthy donors, and profiled tens of thousands of cells using single-cell RNA-sequencing. We developed GASPACHO (GAuSsian Processes for Association mapping leveraging Cell HeterOgeneity), a statistical approach designed to identify nonlinear dynamic genetic effects across transcriptional trajectories of cells. This approach identified 1,275 expression quantitative trait loci (local false discovery rate 10%) that manifested during the responses, many of which were colocalized with susceptibility loci identified by genome-wide association studies of infectious and autoimmune diseases, including the OAS1 splicing quantitative trait locus in a COVID-19 susceptibility locus. In summary, our analytical approach provides a unique framework for delineation of the genetic variants that shape a wide spectrum of transcriptional responses at single-cell resolution.
Collapse
Affiliation(s)
- Natsuhiko Kumasaka
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Medical Support Center of Japan Environment and Children's Study (JECS), National Center for Child Health and Development, Tokyo, Japan
| | - Raghd Rostom
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Ni Huang
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | | | - Kerstin B Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Sharad Patel
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Rachel Boyd
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Celine Gomez
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Sam N Barnett
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | | | - Jeremy Schwartzentruber
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Hinxton, UK
| | - Maya Ghoussaini
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Open Targets, Wellcome Genome Campus, Hinxton, UK
| | - Paul A Lyons
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | | | - Berthold Göttgens
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| | - Josephine L Barnes
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | - Kaylee B Worlock
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | - Masahiro Yoshida
- UCL Respiratory, Division of Medicine, University College London, London, UK
| | - Marko Z Nikolić
- UCL Respiratory, Division of Medicine, University College London, London, UK
- University College London Hospitals NHS Foundation Trust, London, UK
| | - Emily Stephenson
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Gary Reynolds
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Muzlifah Haniffa
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
- Department of Dermatology, Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - John C Marioni
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Oliver Stegle
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Tzachi Hagai
- Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.
- Theory of Condensed Matter Group, Cavendish Laboratory/Department of Physics, University of Cambridge, Cambridge, UK.
| |
Collapse
|
10
|
Zhang Y, Sun H, Lian X, Tang J, Zhu F. ANPELA: Significantly Enhanced Quantification Tool for Cytometry-Based Single-Cell Proteomics. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2207061. [PMID: 36950745 DOI: 10.1002/advs.202207061] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/13/2023] [Indexed: 05/27/2023]
Abstract
ANPELA is widely used for quantifying traditional bulk proteomic data. Recently, there is a clear shift from bulk proteomics to the single-cell ones (SCP), for which powerful cytometry techniques demonstrate the fantastic capacity of capturing cellular heterogeneity that is completely overlooked by traditional bulk profiling. However, the in-depth and high-quality quantification of SCP data is still challenging and severely affected by the large numbers of quantification workflows and extreme performance dependence on the studied datasets. In other words, the proper selection of well-performing workflow(s) for any studied dataset is elusory, and it is urgently needed to have a significantly enhanced and accelerated tool to address this issue. However, no such tool is developed yet. Herein, ANPELA is therefore updated to its 2.0 version (https://idrblab.org/anpela/), which is unique in providing the most comprehensive set of quantification alternatives (>1000 workflows) among all existing tools, enabling systematic performance evaluation from multiple perspectives based on machine learning, and identifying the optimal workflow(s) using overall performance ranking together with the parallel computation. Extensive validation on different benchmark datasets and representative application scenarios suggest the great application potential of ANPELA in current SCP research for gaining more accurate and reliable biological insights.
Collapse
Affiliation(s)
- Ying Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Xichen Lian
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing, 400016, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| |
Collapse
|
11
|
Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023; 21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open
Abstract
Single-cell sequencing technologies have revolutionised the life sciences and biomedical research. Single-cell sequencing provides high-resolution data on cell heterogeneity, allowing high-fidelity cell type identification, and lineage tracking. Computational algorithms and mathematical models have been developed to make sense of the data, compensate for errors and simulate the biological processes, which has led to breakthroughs in our understanding of cell differentiation, cell-fate determination and tissue cell composition. The development of long-read (a.k.a. third-generation) sequencing technologies has produced powerful tools for investigating alternative splicing, isoform expression (at the RNA level), genome assembly and the detection of complex structural variants (at the DNA level). In this review, we provide an overview of the recent advancements in single-cell and long-read sequencing technologies, with a particular focus on the computational algorithms that help in correcting, analysing, and interpreting the resulting data. Additionally, we review some mathematical models that use single-cell and long-read sequencing data to study cell-fate determination and alternative splicing, respectively. Moreover, we highlight the emerging opportunities in modelling cell-fate determination that result from the combination of single-cell and long-read sequencing technologies.
Collapse
Affiliation(s)
- Siyuan Wu
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
- School of Mathematics, Monash University, Melbourne 3800, Victoria, Australia
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
| |
Collapse
|
12
|
Gorin G, Fang M, Chari T, Pachter L. RNA velocity unraveled. PLoS Comput Biol 2022; 18:e1010492. [PMID: 36094956 PMCID: PMC9499228 DOI: 10.1371/journal.pcbi.1010492] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 09/22/2022] [Accepted: 08/14/2022] [Indexed: 11/24/2022] Open
Abstract
We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Meichen Fang
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America
| |
Collapse
|
13
|
Wang Y, Xu Y, Zang Z, Wu L, Li Z. Panoramic Manifold Projection (Panoramap) for Single-Cell Data Dimensionality Reduction and Visualization. Int J Mol Sci 2022; 23:7775. [PMID: 35887125 PMCID: PMC9316349 DOI: 10.3390/ijms23147775] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/03/2022] [Accepted: 07/12/2022] [Indexed: 12/22/2022] Open
Abstract
Nonlinear dimensionality reduction (NLDR) methods such as t-Distributed Stochastic Neighbour Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) have been widely used for biological data exploration, especially in single-cell analysis. However, the existing methods have drawbacks in preserving data's geometric and topological structures. A high-dimensional data analysis method, called Panoramic manifold projection (Panoramap), was developed as an enhanced deep learning framework for structure-preserving NLDR. Panoramap enhances deep neural networks by using cross-layer geometry-preserving constraints. The constraints constitute the loss for deep manifold learning and serve as geometric regularizers for NLDR network training. Therefore, Panoramap has better performance in preserving global structures of the original data. Here, we apply Panoramap to single-cell datasets and show that Panoramap excels at delineating the cell type lineage/hierarchy and can reveal rare cell types. Panoramap can facilitate trajectory inference and has the potential to aid in the early diagnosis of tumors. Panoramap gives improved and more biologically plausible visualization and interpretation of single-cell data. Panoramap can be readily used in single-cell research domains and other research fields that involve high dimensional data analysis.
Collapse
Affiliation(s)
- Yajuan Wang
- College of Mathematical Medicine, Zhejiang Normal University, Jinhua 321004, China
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| | - Yongjie Xu
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| | - Zelin Zang
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| | - Lirong Wu
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| | - Ziqing Li
- School of Engineering, Westlake University, Hangzhou 310024, China; (Y.X.); (Z.Z.); (L.W.); (Z.L.)
| |
Collapse
|
14
|
Chen Y, Siriwardena D, Penfold C, Pavlinek A, Boroviak TE. An integrated atlas of human placental development delineates essential regulators of trophoblast stem cells. Development 2022; 149:275917. [PMID: 35792865 PMCID: PMC9340556 DOI: 10.1242/dev.200171] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 05/12/2022] [Indexed: 12/21/2022]
Abstract
The trophoblast lineage safeguards fetal development by mediating embryo implantation, immune tolerance, nutritional supply and gas exchange. Human trophoblast stem cells (hTSCs) provide a platform to study lineage specification of placental tissues; however, the regulatory network controlling self-renewal remains elusive. Here, we present a single-cell atlas of human trophoblast development from zygote to mid-gestation together with single-cell profiling of hTSCs. We determine the transcriptional networks of trophoblast lineages in vivo and leverage probabilistic modelling to identify a role for MAPK signalling in trophoblast differentiation. Placenta- and blastoid-derived hTSCs consistently map between late trophectoderm and early cytotrophoblast, in contrast to blastoid-trophoblast, which correspond to trophectoderm. We functionally assess the requirement of the predicted cytotrophoblast network in an siRNA-screen and reveal 15 essential regulators for hTSC self-renewal, including MAZ, NFE2L3, TFAP2C, NR2F2 and CTNNB1. Our human trophoblast atlas provides a powerful analytical resource to delineate trophoblast cell fate acquisition, to elucidate transcription factors required for hTSC self-renewal and to gauge the developmental stage of in vitro cultured cells.
Collapse
Affiliation(s)
- Yutong Chen
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Site, Cambridge CB2 3EG, UK.,Centre for Trophoblast Research, University of Cambridge, Downing Site, Cambridge CB2 3EG, UK.,Wellcome Trust - Medical Research Council Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge CB2 0AW, UK
| | - Dylan Siriwardena
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Site, Cambridge CB2 3EG, UK.,Centre for Trophoblast Research, University of Cambridge, Downing Site, Cambridge CB2 3EG, UK.,Wellcome Trust - Medical Research Council Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge CB2 0AW, UK
| | - Christopher Penfold
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Site, Cambridge CB2 3EG, UK.,Centre for Trophoblast Research, University of Cambridge, Downing Site, Cambridge CB2 3EG, UK.,Wellcome Trust - Medical Research Council Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge CB2 0AW, UK
| | | | - Thorsten E Boroviak
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Site, Cambridge CB2 3EG, UK.,Centre for Trophoblast Research, University of Cambridge, Downing Site, Cambridge CB2 3EG, UK.,Wellcome Trust - Medical Research Council Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge CB2 0AW, UK
| |
Collapse
|
15
|
BinTayyash N, Georgaka S, John ST, Ahmed S, Boukouvalas A, Hensman J, Rattray M. Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments. Bioinformatics 2021; 37:3788-3795. [PMID: 34213536 PMCID: PMC10186154 DOI: 10.1093/bioinformatics/btab486] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 06/25/2021] [Accepted: 06/30/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The negative binomial distribution has been shown to be a good model for counts data from both bulk and single-cell RNA-sequencing (RNA-seq). Gaussian process (GP) regression provides a useful non-parametric approach for modelling temporal or spatial changes in gene expression. However, currently available GP regression methods that implement negative binomial likelihood models do not scale to the increasingly large datasets being produced by single-cell and spatial transcriptomics. RESULTS The GPcounts package implements GP regression methods for modelling counts data using a negative binomial likelihood function. Computational efficiency is achieved through the use of variational Bayesian inference. The GP function models changes in the mean of the negative binomial likelihood through a logarithmic link function and the dispersion parameter is fitted by maximum likelihood. We validate the method on simulated time course data, showing better performance to identify changes in over-dispersed counts data than methods based on Gaussian or Poisson likelihoods. To demonstrate temporal inference, we apply GPcounts to single-cell RNA-seq datasets after pseudotime and branching inference. To demonstrate spatial inference, we apply GPcounts to data from the mouse olfactory bulb to identify spatially variable genes and compare to two published GP methods. We also provide the option of modelling additional dropout using a zero-inflated negative binomial. Our results show that GPcounts can be used to model temporal and spatial counts data in cases where simpler Gaussian and Poisson likelihoods are unrealistic. AVAILABILITY AND IMPLEMENTATION GPcounts is implemented using the GPflow library in Python and is available at https://github.com/ManchesterBioinference/GPcounts along with the data, code and notebooks required to reproduce the results presented here. The version used for this paper is archived at https://doi.org/10.5281/zenodo.5027066. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nuha BinTayyash
- School of Computer Science, University of Manchester, Manchester M13 9PL, UK
| | - Sokratia Georgaka
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| | - S T John
- Secondmind, Cambridge CB2 1LA, UK
- Finnish Center for Artificial Intelligence, FCAI, Department of Computer Science, Aalto University, Finland
| | - Sumon Ahmed
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
- Institute of Information Technology, University of Dhaka, Dhaka 1000, Bangladesh
| | | | | | - Magnus Rattray
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
16
|
Single-cell Ribo-seq reveals cell cycle-dependent translational pausing. Nature 2021; 597:561-565. [PMID: 34497418 DOI: 10.1038/s41586-021-03887-4] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 08/06/2021] [Indexed: 12/21/2022]
Abstract
Single-cell sequencing methods have enabled in-depth analysis of the diversity of cell types and cell states in a wide range of organisms. These tools focus predominantly on sequencing the genomes1, epigenomes2 and transcriptomes3 of single cells. However, despite recent progress in detecting proteins by mass spectrometry with single-cell resolution4, it remains a major challenge to measure translation in individual cells. Here, building on existing protocols5-7, we have substantially increased the sensitivity of these assays to enable ribosome profiling in single cells. Integrated with a machine learning approach, this technology achieves single-codon resolution. We validate this method by demonstrating that limitation for a particular amino acid causes ribosome pausing at a subset of the codons encoding the amino acid. Of note, this pausing is only observed in a sub-population of cells correlating to its cell cycle state. We further expand on this phenomenon in non-limiting conditions and detect pronounced GAA pausing during mitosis. Finally, we demonstrate the applicability of this technique to rare primary enteroendocrine cells. This technology provides a first step towards determining the contribution of the translational process to the remarkable diversity between seemingly identical cells.
Collapse
|
17
|
Xiang R, Wang W, Yang L, Wang S, Xu C, Chen X. A Comparison for Dimensionality Reduction Methods of Single-Cell RNA-seq Data. Front Genet 2021; 12:646936. [PMID: 33833778 PMCID: PMC8021860 DOI: 10.3389/fgene.2021.646936] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 02/19/2021] [Indexed: 01/25/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a high-throughput sequencing technology performed at the level of an individual cell, which can have a potential to understand cellular heterogeneity. However, scRNA-seq data are high-dimensional, noisy, and sparse data. Dimension reduction is an important step in downstream analysis of scRNA-seq. Therefore, several dimension reduction methods have been developed. We developed a strategy to evaluate the stability, accuracy, and computing cost of 10 dimensionality reduction methods using 30 simulation datasets and five real datasets. Additionally, we investigated the sensitivity of all the methods to hyperparameter tuning and gave users appropriate suggestions. We found that t-distributed stochastic neighbor embedding (t-SNE) yielded the best overall performance with the highest accuracy and computing cost. Meanwhile, uniform manifold approximation and projection (UMAP) exhibited the highest stability, as well as moderate accuracy and the second highest computing cost. UMAP well preserves the original cohesion and separation of cell populations. In addition, it is worth noting that users need to set the hyperparameters according to the specific situation before using the dimensionality reduction methods based on non-linear model and neural network.
Collapse
Affiliation(s)
- Ruizhi Xiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wencan Wang
- School of Optometry and Ophthalmology and Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chaohan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiaowen Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
18
|
Kopf A, Claassen M. Latent representation learning in biology and translational medicine. PATTERNS (NEW YORK, N.Y.) 2021; 2:100198. [PMID: 33748792 PMCID: PMC7961186 DOI: 10.1016/j.patter.2021.100198] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Current data generation capabilities in the life sciences render scientists in an apparently contradicting situation. While it is possible to simultaneously measure an ever-increasing number of systems parameters, the resulting data are becoming increasingly difficult to interpret. Latent variable modeling allows for such interpretation by learning non-measurable hidden variables from observations. This review gives an overview over the different formal approaches to latent variable modeling, as well as applications at different scales of biological systems, such as molecular structures, intra- and intercellular regulatory up to physiological networks. The focus is on demonstrating how these approaches have enabled interpretable representations and ultimately insights in each of these domains. We anticipate that a wider dissemination of latent variable modeling in the life sciences will enable a more effective and productive interpretation of studies based on heterogeneous and high-dimensional data modalities.
Collapse
Affiliation(s)
- Andreas Kopf
- Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland
| | - Manfred Claassen
- Division of Clinical Bioinformatics, Department of Internal Medicine I, University Hospital Tübingen, 72076 Tübingen, Germany
- Computer Science Department, Eberhard Karls University of Tübingen, 72076 Tübingen, Germany
- Cluster of Excellence Machine Learning (EXC 2064), Eberhard Karls University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
19
|
Phillips NE, Hugues A, Yeung J, Durandau E, Nicolas D, Naef F. The circadian oscillator analysed at the single-transcript level. Mol Syst Biol 2021; 17:e10135. [PMID: 33719202 PMCID: PMC7957410 DOI: 10.15252/msb.202010135] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 01/05/2021] [Accepted: 01/19/2021] [Indexed: 12/31/2022] Open
Abstract
The circadian clock is an endogenous and self-sustained oscillator that anticipates daily environmental cycles. While rhythmic gene expression of circadian genes is well-described in populations of cells, the single-cell mRNA dynamics of multiple core clock genes remain largely unknown. Here we use single-molecule fluorescence in situ hybridisation (smFISH) at multiple time points to measure pairs of core clock transcripts, Rev-erbα (Nr1d1), Cry1 and Bmal1, in mouse fibroblasts. The mean mRNA level oscillates over 24 h for all three genes, but mRNA numbers show considerable spread between cells. We develop a probabilistic model for multivariate mRNA counts using mixtures of negative binomials, which accounts for transcriptional bursting, circadian time and cell-to-cell heterogeneity, notably in cell size. Decomposing the mRNA variability into distinct noise sources shows that clock time contributes a small fraction of the total variability in mRNA number between cells. Thus, our results highlight the intrinsic biological challenges in estimating circadian phase from single-cell mRNA counts and suggest that circadian phase in single cells is encoded post-transcriptionally.
Collapse
Affiliation(s)
- Nicholas E Phillips
- Institute of BioengineeringSchool of Life SciencesEcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Alice Hugues
- Institute of BioengineeringSchool of Life SciencesEcole Polytechnique Fédérale de LausanneLausanneSwitzerland
- Master de BiologieÉcole Normale Supérieure de LyonUniversité Claude Bernard Lyon IUniversité de LyonLyonFrance
| | - Jake Yeung
- Institute of BioengineeringSchool of Life SciencesEcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Eric Durandau
- Institute of BioengineeringSchool of Life SciencesEcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Damien Nicolas
- Institute of BioengineeringSchool of Life SciencesEcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| | - Felix Naef
- Institute of BioengineeringSchool of Life SciencesEcole Polytechnique Fédérale de LausanneLausanneSwitzerland
| |
Collapse
|
20
|
Abstract
The link between mind, brain, and behavior has mystified philosophers and scientists for millennia. Recent progress has been made by forming statistical associations between manifest variables of the brain (e.g., electroencephalogram [EEG], functional MRI [fMRI]) and manifest variables of behavior (e.g., response times, accuracy) through hierarchical latent variable models. Within this framework, one can make inferences about the mind in a statistically principled way, such that complex patterns of brain-behavior associations drive the inference procedure. However, previous approaches were limited in the flexibility of the linking function, which has proved prohibitive for understanding the complex dynamics exhibited by the brain. In this article, we propose a data-driven, nonparametric approach that allows complex linking functions to emerge from fitting a hierarchical latent representation of the mind to multivariate, multimodal data. Furthermore, to enforce biological plausibility, we impose both spatial and temporal structure so that the types of realizable system dynamics are constrained. To illustrate the benefits of our approach, we investigate the model's performance in a simulation study and apply it to experimental data. In the simulation study, we verify that the model can be accurately fitted to simulated data, and latent dynamics can be well recovered. In an experimental application, we simultaneously fit the model to fMRI and behavioral data from a continuous motion tracking task. We show that the model accurately recovers both neural and behavioral data and reveals interesting latent cognitive dynamics, the topology of which can be contrasted with several aspects of the experiment.
Collapse
|
21
|
Lieberman B, Kusi M, Hung CN, Chou CW, He N, Ho YY, Taverna JA, Huang THM, Chen CL. Toward uncharted territory of cellular heterogeneity: advances and applications of single-cell RNA-seq. JOURNAL OF TRANSLATIONAL GENETICS AND GENOMICS 2021; 5:1-21. [PMID: 34322662 PMCID: PMC8315474 DOI: 10.20517/jtgg.2020.51] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Among single-cell analysis technologies, single-cell RNA-seq (scRNA-seq) has been one of the front runners in technical inventions. Since its induction, scRNA-seq has been well received and undergone many fast-paced technical improvements in cDNA synthesis and amplification, processing and alignment of next generation sequencing reads, differentially expressed gene calling, cell clustering, subpopulation identification, and developmental trajectory prediction. scRNA-seq has been exponentially applied to study global transcriptional profiles in all cell types in humans and animal models, healthy or with diseases, including cancer. Accumulative novel subtypes and rare subpopulations have been discovered as potential underlying mechanisms of stochasticity, differentiation, proliferation, tumorigenesis, and aging. scRNA-seq has gradually revealed the uncharted territory of cellular heterogeneity in transcriptomes and developed novel therapeutic approaches for biomedical applications. This review of the advancement of scRNA-seq methods provides an exploratory guide of the quickly evolving technical landscape and insights of focused features and strengths in each prominent area of progress.
Collapse
Affiliation(s)
- Brandon Lieberman
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Meena Kusi
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Chia-Nung Hung
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Chih-Wei Chou
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Ning He
- Department of Nursing, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, SC 29208, USA
| | - Josephine A. Taverna
- Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
- Mays Cancer Center, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Tim H. M. Huang
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
- Mays Cancer Center, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Chun-Liang Chen
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
- Mays Cancer Center, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| |
Collapse
|
22
|
Verma A, Engelhardt BE. A robust nonlinear low-dimensional manifold for single cell RNA-seq data. BMC Bioinformatics 2020; 21:324. [PMID: 32693778 PMCID: PMC7374962 DOI: 10.1186/s12859-020-03625-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Accepted: 06/22/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single-cell data. However, methods have yet to be developed for unfiltered and unnormalized count data that estimate uncertainty in the low-dimensional space. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. RESULTS Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student's t-distribution to estimate a manifold that is robust to technical and biological noise found in normalized scRNA-seq data. We compare our approach to common dimension reduction tools across a diverse set of scRNA-seq data sets to highlight our model's ability to enable important downstream tasks such as clustering, inferring cell developmental trajectories, and visualizing high throughput experiments on available experimental data. CONCLUSION We show that our adaptive robust statistical approach to estimate a nonlinear manifold is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.
Collapse
Affiliation(s)
- Archit Verma
- Chemical and Biological Engineering, Princeton University, 50-70 Olden Street, Princeton, 08540 NJ USA
| | - Barbara E. Engelhardt
- Computer Science, Center for Statistics and Machine Learning, 35 Olden Street, Princeton, 08540 NJ USA
| |
Collapse
|
23
|
Strauss ME, Kirk PDW, Reid JE, Wernisch L. GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution. Bioinformatics 2020; 36:1484-1491. [PMID: 31608923 PMCID: PMC7703763 DOI: 10.1093/bioinformatics/btz778] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 09/20/2019] [Accepted: 10/09/2019] [Indexed: 01/21/2023] Open
Abstract
MOTIVATION Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters. RESULTS The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with non-parametric Bayesian clustering methods, efficient Markov Chain Monte Carlo sampling and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings. AVAILABILITY AND IMPLEMENTATION An implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Magdalena E Strauss
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SR, UK
| | - Paul D W Kirk
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SR, UK
- Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge CB2 0SP, UK
| | - John E Reid
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SR, UK
| | - Lorenz Wernisch
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SR, UK
| |
Collapse
|
24
|
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 681] [Impact Index Per Article: 136.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open
Abstract
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Collapse
Affiliation(s)
- David Lähnemann
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Ewa Szczurek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Davis J. McCarthy
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia
- Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
| | - Catalina A. Vallejos
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
- The Alan Turing Institute, British Library, London, UK
| | - Kieran R. Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Data Science Institute, University of British Columbia, Vancouver, Canada
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmed Mahfouz
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Jasmijn Baaijens
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | - Marleen Balvert
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Buys de Barbanson
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Antonio Cappuccio
- Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
| | - Giacomo Corleone
- Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria Florescu
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rens Holmer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Thamar Jessurun Lobo
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Emma M. Keizer
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Indu Khatri
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Szymon M. Kielbasa
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alexey M. Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Boudewijn P.F. Lelieveldt
- PRB lab, Delft University of Technology, Delft, The Netherlands
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ion I. Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, USA
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Felix Mölder
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Amir Niknejad
- Computation molecular design, Zuse Institute Berlin, Berlin, Germany
- Mathematics Department, Mount Saint Vincent, New York, USA
| | - Alicja Rączkowska
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Marcel Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
| | - Antonios Somarakis
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Oliver Stegle
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
| | - Huan Yang
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Alice C. McHardy
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
25
|
Strauß ME, Reid JE, Wernisch L. GPseudoRank: a permutation sampler for single cell orderings. Bioinformatics 2019; 35:611-618. [PMID: 30052778 PMCID: PMC6230469 DOI: 10.1093/bioinformatics/bty664] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 06/13/2018] [Accepted: 07/24/2018] [Indexed: 11/30/2022] Open
Abstract
Motivation A number of pseudotime methods have provided point estimates of the ordering of cells for scRNA-seq data. A still limited number of methods also model the uncertainty of the pseudotime estimate. However, there is still a need for a method to sample from complicated and multi-modal distributions of orders, and to estimate changes in the amount of the uncertainty of the order during the course of a biological development, as this can support the selection of suitable cells for the clustering of genes or for network inference. Results In applications to scRNA-seq data we demonstrate the potential of GPseudoRank to sample from complex and multi-modal posterior distributions and to identify phases of lower and higher pseudotime uncertainty during a biological process. GPseudoRank also correctly identifies cells precocious in their antiviral response and links uncertainty in the ordering to metastable states. A variant of the method extends the advantages of Bayesian modelling and MCMC to large droplet-based scRNA-seq datasets. Availability and implementation Our method is available on github: https://github.com/magStra/GPseudoRank. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Magdalena E Strauß
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - John E Reid
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.,Alan Turing Institute, London, UK
| | - Lorenz Wernisch
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
26
|
Tritschler S, Büttner M, Fischer DS, Lange M, Bergen V, Lickert H, Theis FJ. Concepts and limitations for learning developmental trajectories from single cell genomics. Development 2019; 146. [DOI: 10.1242/dev.170506] [Citation(s) in RCA: 132] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Abstract
ABSTRACT
Single cell genomics has become a popular approach to uncover the cellular heterogeneity of progenitor and terminally differentiated cell types with great precision. This approach can also delineate lineage hierarchies and identify molecular programmes of cell-fate acquisition and segregation. Nowadays, tens of thousands of cells are routinely sequenced in single cell-based methods and even more are expected to be analysed in the future. However, interpretation of the resulting data is challenging and requires computational models at multiple levels of abstraction. In contrast to other applications of single cell sequencing, where clustering approaches dominate, developmental systems are generally modelled using continuous structures, trajectories and trees. These trajectory models carry the promise of elucidating mechanisms of development, disease and stimulation response at very high molecular resolution. However, their reliable analysis and biological interpretation requires an understanding of their underlying assumptions and limitations. Here, we review the basic concepts of such computational approaches and discuss the characteristics of developmental processes that can be learnt from trajectory models.
Collapse
Affiliation(s)
- Sophie Tritschler
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85353 Freising, Germany
| | - Maren Büttner
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - David S. Fischer
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85353 Freising, Germany
| | - Marius Lange
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Volker Bergen
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Heiko Lickert
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- German Center for Diabetes Research, 85764 Neuherberg, Germany
- Institute of Stem Cell Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| |
Collapse
|
27
|
Chen G, Ning B, Shi T. Single-Cell RNA-Seq Technologies and Related Computational Data Analysis. Front Genet 2019; 10:317. [PMID: 31024627 PMCID: PMC6460256 DOI: 10.3389/fgene.2019.00317] [Citation(s) in RCA: 577] [Impact Index Per Article: 96.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 03/21/2019] [Indexed: 12/15/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies allow the dissection of gene expression at single-cell resolution, which greatly revolutionizes transcriptomic studies. A number of scRNA-seq protocols have been developed, and these methods possess their unique features with distinct advantages and disadvantages. Due to technical limitations and biological factors, scRNA-seq data are noisier and more complex than bulk RNA-seq data. The high variability of scRNA-seq data raises computational challenges in data analysis. Although an increasing number of bioinformatics methods are proposed for analyzing and interpreting scRNA-seq data, novel algorithms are required to ensure the accuracy and reproducibility of results. In this review, we provide an overview of currently available single-cell isolation protocols and scRNA-seq technologies, and discuss the methods for diverse scRNA-seq data analyses including quality control, read mapping, gene expression quantification, batch effect correction, normalization, imputation, dimensionality reduction, feature selection, cell clustering, trajectory inference, differential expression calling, alternative splicing, allelic expression, and gene regulatory network reconstruction. Further, we outline the prospective development and applications of scRNA-seq technologies.
Collapse
Affiliation(s)
- Geng Chen
- Center for Bioinformatics and Computational Biology, and Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Baitang Ning
- National Center for Toxicological Research, United States Food and Drug Administration, Jefferson, AR, United States
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| |
Collapse
|
28
|
Yau C, Campbell K. Bayesian statistical learning for big data biology. Biophys Rev 2019; 11:95-102. [PMID: 30729409 PMCID: PMC6381359 DOI: 10.1007/s12551-019-00499-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 11/10/2022] Open
Abstract
Bayesian statistical learning provides a coherent probabilistic framework for modelling uncertainty in systems. This review describes the theoretical foundations underlying Bayesian statistics and outlines the computational frameworks for implementing Bayesian inference in practice. We then describe the use of Bayesian learning in single-cell biology for the analysis of high-dimensional, large data sets.
Collapse
Affiliation(s)
- Christopher Yau
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK.
- The Alan Turing Institute, London, UK.
| | - Kieran Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
| |
Collapse
|