51
|
Campbell KR, Yau C. A descriptive marker gene approach to single-cell pseudotime inference. Bioinformatics 2019; 35:28-35. [PMID: 29939207 PMCID: PMC6298060 DOI: 10.1093/bioinformatics/bty498] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Revised: 02/05/2018] [Accepted: 06/20/2018] [Indexed: 12/25/2022] Open
Abstract
Motivation Pseudotime estimation from single-cell gene expression data allows the recovery of temporal information from otherwise static profiles of individual cells. Conventional pseudotime inference methods emphasize an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. However, the resulting trajectories can only be understood in terms of abstract geometric structures and not in terms of interpretable models of gene behaviour. Results Here we introduce an orthogonal Bayesian approach termed 'Ouija' that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. We demonstrate that this small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify 'metastable' states-discrete cell types along the continuous trajectories-that recapitulate known cell types. Availability and implementation An open source implementation is available as an R package at http://www.github.com/kieranrcampbell/ouija and as a Python/TensorFlow package at http://www.github.com/kieranrcampbell/ouijaflow. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kieran R Campbell
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK
- Wellcome Trust Centre for Human Genetics University of Oxford, Oxford, UK
| | - Christopher Yau
- Wellcome Trust Centre for Human Genetics University of Oxford, Oxford, UK
- Department of Statistics, University of Oxford, Oxford, UK
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, UK
| |
Collapse
|
52
|
Lamere AT, Li J. Inference of Gene Co-expression Networks from Single-Cell RNA-Sequencing Data. Methods Mol Biol 2019; 1935:141-153. [PMID: 30758825 DOI: 10.1007/978-1-4939-9057-3_10] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Single-cell RNA-Sequencing is a pioneering extension of bulk-based RNA-Sequencing technology. The "guilt-by-association" heuristic has led to the use of gene co-expression networks to identify genes that are believed to be associated with a common cellular function. Many methods that were developed for bulk-based RNA-Sequencing data can continue to be applied to single-cell data, and several of the most widely used methods are explored. Several methods for leveraging the novel time information contained in single-cell data when constructing gene co-expression networks, which allows for the incorporation of directed associations, are also discussed.
Collapse
Affiliation(s)
- Alicia T Lamere
- Mathematics Department, Bryant University, Smithfield, RI, USA.
| | - Jun Li
- Applied and Computational Mathematics and Statistics Department, University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
53
|
Ahmed S, Rattray M, Boukouvalas A. GrandPrix: scaling up the Bayesian GPLVM for single-cell data. Bioinformatics 2019; 35:47-54. [PMID: 30561544 PMCID: PMC6298059 DOI: 10.1093/bioinformatics/bty533] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Revised: 05/02/2018] [Accepted: 06/28/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation The Gaussian Process Latent Variable Model (GPLVM) is a popular approach for dimensionality reduction of single-cell data and has been used for pseudotime estimation with capture time information. However, current implementations are computationally intensive and will not scale up to modern droplet-based single-cell datasets which routinely profile many tens of thousands of cells. Results We provide an efficient implementation which allows scaling up this approach to modern single-cell datasets. We also generalize the application of pseudotime inference to cases where there are other sources of variation such as branching dynamics. We apply our method on microarray, nCounter, RNA-seq, qPCR and droplet-based datasets from different organisms. The model converges an order of magnitude faster compared to existing methods whilst achieving similar levels of estimation accuracy. Further, we demonstrate the flexibility of our approach by extending the model to higher-dimensional latent spaces that can be used to simultaneously infer pseudotime and other structure such as branching. Thus, the model has the capability of producing meaningful biological insights about cell ordering as well as cell fate regulation. Availability and implementation Software available at github.com/ManchesterBioinference/GrandPrix. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sumon Ahmed
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Magnus Rattray
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Alexis Boukouvalas
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| |
Collapse
|
54
|
Machine learning based classification of cells into chronological stages using single-cell transcriptomics. Sci Rep 2018; 8:17156. [PMID: 30464314 PMCID: PMC6249247 DOI: 10.1038/s41598-018-35218-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 10/30/2018] [Indexed: 12/11/2022] Open
Abstract
Age-associated deterioration of cellular physiology leads to pathological conditions. The ability to detect premature aging could provide a window for preventive therapies against age-related diseases. However, the techniques for determining cellular age are limited, as they rely on a limited set of histological markers and lack predictive power. Here, we implement GERAS (GEnetic Reference for Age of Single-cell), a machine learning based framework capable of assigning individual cells to chronological stages based on their transcriptomes. GERAS displays greater than 90% accuracy in classifying the chronological stage of zebrafish and human pancreatic cells. The framework demonstrates robustness against biological and technical noise, as evaluated by its performance on independent samplings of single-cells. Additionally, GERAS determines the impact of differences in calorie intake and BMI on the aging of zebrafish and human pancreatic cells, respectively. We further harness the classification ability of GERAS to identify molecular factors that are potentially associated with the aging of beta-cells. We show that one of these factors, junba, is necessary to maintain the proliferative state of juvenile beta-cells. Our results showcase the applicability of a machine learning framework to classify the chronological stage of heterogeneous cell populations, while enabling detection of candidate genes associated with aging.
Collapse
|
55
|
Rath P, Allen JA, Schneider DS. Predicting position along a looping immune response trajectory. PLoS One 2018; 13:e0200147. [PMID: 30296270 PMCID: PMC6175499 DOI: 10.1371/journal.pone.0200147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 06/20/2018] [Indexed: 12/13/2022] Open
Abstract
When we get sick, we want to be resilient and recover our original health. To measure resilience, we need to quantify a host's position along its disease trajectory. Here we present Looper, a computational method to analyze longitudinally gathered datasets and identify gene pairs that form looping trajectories when plotted in the space described by these phases. These loops enable us to track where patients lie on a typical trajectory back to health. We analyzed two publicly available, longitudinal human microarray datasets that describe self-resolving immune responses. Looper identified looping gene pairs expressed by human donor monocytes stimulated by immune elicitors, and in YF17D-vaccinated individuals. Using loops derived from training data, we found that we could predict the time of perturbation in withheld test samples with accuracies of 94% in the human monocyte data, and 65-83% within the same cohort and in two independent cohorts of YF17D vaccinated individuals. We suggest that Looper will be useful in building maps of resilient immune processes across organisms.
Collapse
Affiliation(s)
- Poonam Rath
- Department of Microbiology and Immunology, Stanford University, Stanford CA, United States of America
| | - Jessica A. Allen
- Department of Microbiology and Immunology, Stanford University, Stanford CA, United States of America
| | - David S. Schneider
- Department of Microbiology and Immunology, Stanford University, Stanford CA, United States of America
- * E-mail:
| |
Collapse
|
56
|
Penfold CA, Sybirna A, Reid JE, Huang Y, Wernisch L, Ghahramani Z, Grant M, Surani MA. Branch-recombinant Gaussian processes for analysis of perturbations in biological time series. Bioinformatics 2018; 34:i1005-i1013. [PMID: 30423108 PMCID: PMC6129282 DOI: 10.1093/bioinformatics/bty603] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Motivation A common class of behaviour encountered in the biological sciences involves branching and recombination. During branching, a statistical process bifurcates resulting in two or more potentially correlated processes that may undergo further branching; the contrary is true during recombination, where two or more statistical processes converge. A key objective is to identify the time of this bifurcation (branch or recombination time) from time series measurements, e.g. by comparing a control time series with perturbed time series. Gaussian processes (GPs) represent an ideal framework for such analysis, allowing for nonlinear regression that includes a rigorous treatment of uncertainty. Currently, however, GP models only exist for two-branch systems. Here, we highlight how arbitrarily complex branching processes can be built using the correct composition of covariance functions within a GP framework, thus outlining a general framework for the treatment of branching and recombination in the form of branch-recombinant Gaussian processes (B-RGPs). Results We first benchmark the performance of B-RGPs compared to a variety of existing regression approaches, and demonstrate robustness to model misspecification. B-RGPs are then used to investigate the branching patterns of Arabidopsis thaliana gene expression following inoculation with the hemibotrophic bacteria, Pseudomonas syringae DC3000, and a disarmed mutant strain, hrpA. By grouping genes according to the number of branches, we could naturally separate out genes involved in basal immune response from those subverted by the virulent strain, and show enrichment for targets of pathogen protein effectors. Finally, we identify two early branching genes WRKY11 and WRKY17, and show that genes that branched at similar times to WRKY11/17 were enriched for W-box binding motifs, and overrepresented for genes differentially expressed in WRKY11/17 knockouts, suggesting that branch time could be used for identifying direct and indirect binding targets of key transcription factors. Availability and implementation https://github.com/cap76/BranchingGPs Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher A Penfold
- Wellcome Trust/Cancer Research UK Gurdon Institute, Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK.,Department of Statistics, University of Warwick, Coventry, UK
| | - Anastasiya Sybirna
- Wellcome Trust/Cancer Research UK Gurdon Institute, Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK.,Wellcome/MRC Stem Cell Institute, University of Cambridge, UK.,Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge
| | - John E Reid
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK.,The Alan Turing Institute, London, UK
| | - Yun Huang
- Wellcome Trust/Cancer Research UK Gurdon Institute, Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK.,Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge
| | - Lorenz Wernisch
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Cambridge Biomedical Campus, Cambridge, UK
| | | | - Murray Grant
- School of Life Sciences, Gibbet Hill Campus, The University of Warwick, Coventry, UK
| | - M Azim Surani
- Wellcome Trust/Cancer Research UK Gurdon Institute, Henry Wellcome Building of Cancer and Developmental Biology, Cambridge, UK.,Department of Statistics, University of Warwick, Coventry, UK.,Wellcome/MRC Stem Cell Institute, University of Cambridge, UK
| |
Collapse
|
57
|
Dasgupta S, Bader GD, Goyal S. Single-Cell RNA Sequencing: A New Window into Cell Scale Dynamics. Biophys J 2018; 115:429-435. [PMID: 30033145 DOI: 10.1016/j.bpj.2018.07.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 06/29/2018] [Accepted: 07/03/2018] [Indexed: 01/04/2023] Open
Abstract
Single-cell genomics has recently emerged as a powerful tool for observing multicellular systems at a much higher level of resolution and depth than previously possible. High-throughput single-cell RNA sequencing techniques are able to simultaneously quantify expression levels of several thousands of genes within individual cells for tens of thousands of cells within a complex tissue. This has led to development of novel computational methods to analyze this high-dimensional data, investigating longstanding and fundamental questions regarding the granularity of cell types, the definition of cell states, and transitions from one cell type to another along developmental trajectories. In this perspective, we outline this emerging field starting from the "input data" (e.g., quantifying transcription levels in single cells), which are analyzed to define "identities" (e.g., cell types, states, and key genes) and to build "interactions" using models that can infer relations and transitions between cells.
Collapse
Affiliation(s)
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada; Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
| | - Sidhartha Goyal
- Department of Physics, University of Toronto, Toronto, Ontario, Canada; Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
58
|
RNA-seq analysis identifies different transcriptomic types and developmental trajectories of primary melanomas. Oncogene 2018; 37:6136-6151. [PMID: 29995873 DOI: 10.1038/s41388-018-0385-y] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Revised: 05/30/2018] [Accepted: 05/31/2018] [Indexed: 12/13/2022]
Abstract
Recent studies revealed trajectories of mutational events in early melanomagenesis, but the accompanying changes in gene expression are far less understood. Therefore, we performed a comprehensive RNA-seq analysis of laser-microdissected melanocytic nevi (n = 23) and primary melanoma samples (n = 57) and characterized the molecular mechanisms of early melanoma development. Using self-organizing maps, unsupervised clustering, and analysis of pseudotime (PT) dynamics to identify evolutionary trajectories, we describe here two transcriptomic types of melanocytic nevi (N1 and N2) and primary melanomas (M1 and M2). N1/M1 lesions are characterized by pigmentation-type and MITF gene signatures, and a high prevalence of NRAS mutations in M1 melanomas. N2/M2 lesions are characterized by inflammatory-type and AXL gene signatures with an equal distribution of wild-type and mutated BRAF and low prevalence of NRAS mutations in M2 melanomas. Interestingly, N1 nevi and M1 melanomas and N2 nevi and M2 melanomas, respectively, cluster together, but there is no clustering in a stage-dependent manner. Transcriptional signatures of M1 melanomas harbor signatures of BRAF/MEK inhibitor resistance and M2 melanomas harbor signatures of anti-PD-1 antibody treatment resistance. Pseudotime dynamics of nevus and melanoma samples are suggestive for a switch-like immune-escape mechanism in melanoma development with downregulation of immune genes paralleled by an increasing expression of a cell cycle signature in late-stage melanomas. Taken together, the transcriptome analysis identifies gene signatures and mechanisms underlying development of melanoma in early and late stages with relevance for diagnostics and therapy.
Collapse
|
59
|
Hon CC, Shin JW, Carninci P, Stubbington MJT. The Human Cell Atlas: Technical approaches and challenges. Brief Funct Genomics 2018; 17:283-294. [PMID: 29092000 PMCID: PMC6063304 DOI: 10.1093/bfgp/elx029] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The Human Cell Atlas is a large, international consortium that aims to identify and describe every cell type in the human body. The comprehensive cellular maps that arise from this ambitious effort have the potential to transform many aspects of fundamental biology and clinical practice. Here, we discuss the technical approaches that could be used today to generate such a resource and also the technical challenges that will be encountered.
Collapse
Affiliation(s)
- Chung-Chau Hon
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Jay W Shin
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | | |
Collapse
|
60
|
Kolodziejczyk AA, Lönnberg T. Global and targeted approaches to single-cell transcriptome characterization. Brief Funct Genomics 2018; 17:209-219. [PMID: 29028866 PMCID: PMC6063303 DOI: 10.1093/bfgp/elx025] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Analysing transcriptomes of cell populations is a standard molecular biology approach to understand how cells function. Recent methodological development has allowed performing similar experiments on single cells. This has opened up the possibility to examine samples with limited cell number, such as cells of the early embryo, and to obtain an understanding of heterogeneity within populations such as blood cell types or neurons. There are two major approaches for single-cell transcriptome analysis: quantitative reverse transcription PCR (RT-qPCR) on a limited number of genes of interest, or more global approaches targeting entire transcriptomes using RNA sequencing. RT-qPCR is sensitive, fast and arguably more straightforward, while whole-transcriptome approaches offer an unbiased perspective on a cell's expression status.
Collapse
Affiliation(s)
| | - Tapio Lönnberg
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- EMBL-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
61
|
Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data. Nat Commun 2018; 9:2442. [PMID: 29934517 PMCID: PMC6015076 DOI: 10.1038/s41467-018-04696-6] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 05/17/2018] [Indexed: 12/29/2022] Open
Abstract
Pseudotime algorithms can be employed to extract latent temporal information from cross-sectional data sets allowing dynamic biological processes to be studied in situations where the collection of time series data is challenging or prohibitive. Computational techniques have arisen from single-cell ‘omics and cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically implicitly assume homogeneous genetic, phenotypic or environmental backgrounds, which becomes limiting as data sets grow in size and complexity. We describe a novel statistical framework that learns how pseudotime trajectories can be modulated through covariates that encode such factors. We apply this model to both single-cell and bulk gene expression data sets and show that the approach can recover known and novel covariate-pseudotime interaction effects. This hybrid regression-latent variable model framework extends pseudotemporal modelling from its most prevalent area of single cell genomics to wider applications. Cross-sectional omic data often have non-homogeneous genetic, phenotypic, or environmental backgrounds. Here, the authors develop a statistical framework to infer pseudotime trajectories in the presence of such factors as well as their interactions in both single-cell and bulk gene expression analysis
Collapse
|
62
|
Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, Purdom E, Dudoit S. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 2018; 19:477. [PMID: 29914354 PMCID: PMC6007078 DOI: 10.1186/s12864-018-4772-0] [Citation(s) in RCA: 1585] [Impact Index Per Article: 226.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 05/09/2018] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Single-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve. RESULTS We introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods. CONCLUSIONS Slingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression.
Collapse
Affiliation(s)
- Kelly Street
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA USA
- Center for Computational Biology, University of California, Berkeley, CA USA
| | - Davide Risso
- Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, Weill Cornell Medicine, 407 E 61st St, New York, 10065 NY USA
| | - Russell B. Fletcher
- Department of Molecular and Cell Biology, University of California, Berkeley, CA USA
| | - Diya Das
- Department of Molecular and Cell Biology, University of California, Berkeley, CA USA
- Berkeley Institute for Data Science, University of California, Berkeley, CA USA
| | - John Ngai
- Department of Molecular and Cell Biology, University of California, Berkeley, CA USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA USA
- QB3 Berkeley Functional Genomics Laboratory, Berkeley, CA USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA USA
- Center for Computational Biology, University of California, Berkeley, CA USA
| | - Elizabeth Purdom
- Department of Statistics, University of California, Berkeley, CA USA
- Center for Computational Biology, University of California, Berkeley, CA USA
| | - Sandrine Dudoit
- Division of Biostatistics, School of Public Health, University of California, Berkeley, CA USA
- Department of Statistics, University of California, Berkeley, CA USA
- Center for Computational Biology, University of California, Berkeley, CA USA
- Berkeley Institute for Data Science, University of California, Berkeley, CA USA
| |
Collapse
|
63
|
Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 2018; 19:232. [PMID: 29914350 PMCID: PMC6006753 DOI: 10.1186/s12859-018-2217-z] [Citation(s) in RCA: 134] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Accepted: 05/24/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. RESULTS Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. CONCLUSIONS This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less similarity to each other in the sets of edges detected. The results from this study emphasize the importance for developing more accurate optimized network modeling methods that are compatible for single cell data. Newly-developed single cell methods may uniquely capture particular features of potential gene-gene relationships, and caution should be taken when we interpret these results.
Collapse
Affiliation(s)
- Shuonan Chen
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Jessica C Mar
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA. .,Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA. .,Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
64
|
Gong W, Kwak IY, Pota P, Koyano-Nakagawa N, Garry DJ. DrImpute: imputing dropout events in single cell RNA sequencing data. BMC Bioinformatics 2018; 19:220. [PMID: 29884114 PMCID: PMC5994079 DOI: 10.1186/s12859-018-2226-y] [Citation(s) in RCA: 187] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 05/30/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The single cell RNA sequencing (scRNA-seq) technique begin a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events. RESULTS We develop DrImpute to impute dropout events in scRNA-seq data. We show that DrImpute has significantly better performance on the separation of the dropout zeros from true zeros than existing imputation algorithms. We also demonstrate that DrImpute can significantly improve the performance of existing tools for clustering, visualization and lineage reconstruction of nine published scRNA-seq datasets. CONCLUSIONS DrImpute can serve as a very useful addition to the currently existing statistical tools for single cell RNA-seq analysis. DrImpute is implemented in R and is available at https://github.com/gongx030/DrImpute .
Collapse
Affiliation(s)
- Wuming Gong
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| | - Il-Youp Kwak
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| | - Pruthvi Pota
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| | - Naoko Koyano-Nakagawa
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| | - Daniel J. Garry
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| |
Collapse
|
65
|
Boukouvalas A, Hensman J, Rattray M. BGP: identifying gene-specific branching dynamics from single-cell data with a branching Gaussian process. Genome Biol 2018; 19:65. [PMID: 29843817 PMCID: PMC5975664 DOI: 10.1186/s13059-018-1440-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 05/01/2018] [Indexed: 12/24/2022] Open
Abstract
High-throughput single-cell gene expression experiments can be used to uncover branching dynamics in cell populations undergoing differentiation through pseudotime methods. We develop the branching Gaussian process (BGP), a non-parametric model that is able to identify branching dynamics for individual genes and provide an estimate of branching times for each gene with an associated credible region. We demonstrate the effectiveness of our method on simulated data, a single-cell RNA-seq haematopoiesis study and mouse embryonic stem cells generated using droplet barcoding. The method is robust to high levels of technical variation and dropout, which are common in single-cell data.
Collapse
Affiliation(s)
- Alexis Boukouvalas
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester, UK
| | | | - Magnus Rattray
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester, UK
| |
Collapse
|
66
|
Specht AT, Li J. LEAP: constructing gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering. Bioinformatics 2018; 33:764-766. [PMID: 27993778 DOI: 10.1093/bioinformatics/btw729] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 11/15/2016] [Indexed: 01/09/2023] Open
Abstract
Summary To construct gene co-expression networks based on single-cell RNA-Sequencing data, we present an algorithm called LEAP, which utilizes the estimated pseudotime of the cells to find gene co-expression that involves time delay. Availability and Implementation R package LEAP available on CRAN. Contact jun.li@nd.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
67
|
Chen J, Rénia L, Ginhoux F. Constructing cell lineages from single-cell transcriptomes. Mol Aspects Med 2017; 59:95-113. [PMID: 29107741 DOI: 10.1016/j.mam.2017.10.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 10/23/2017] [Accepted: 10/25/2017] [Indexed: 12/25/2022]
Abstract
Advances in single-cell RNA-sequencing have helped reveal the previously underappreciated level of cellular heterogeneity present during cellular differentiation. A static snapshot of single-cell transcriptomes provides a good representation of the various stages of differentiation as differentiation is rarely synchronized between cells. Data from numerous single-cell analyses has suggested that cellular differentiation and development can be conceptualized as continuous processes. Consequently, computational algorithms have been developed to infer lineage relationships between cell types and construct developmental trajectories along which cells are re-ordered such that similarity between successive cell pairs is maximized. Here, we compare and contrast the existing computational methods, and illustrate how they may be applied to build mouse myeloid progenitor lineages from massively parallel RNA single-cell sequencing data.
Collapse
Affiliation(s)
- Jinmiao Chen
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore.
| | - Laurent Rénia
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore
| | - Florent Ginhoux
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore
| |
Collapse
|
68
|
|
69
|
Systematic Discovery of Archaeal Transcription Factor Functions in Regulatory Networks through Quantitative Phenotyping Analysis. mSystems 2017; 2:mSystems00032-17. [PMID: 28951888 PMCID: PMC5605881 DOI: 10.1128/msystems.00032-17] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 08/03/2017] [Indexed: 11/26/2022] Open
Abstract
To ensure survival in the face of stress, microorganisms employ inducible damage repair pathways regulated by extensive and complex gene networks. Many archaea, microorganisms of the third domain of life, persist under extremes of temperature, salinity, and pH and under other conditions. In order to understand the cause-effect relationships between the dynamic function of the stress network and ultimate physiological consequences, this study characterized the physiological role of nearly one-third of all regulatory proteins known as transcription factors (TFs) in an archaeal organism. Using a unique quantitative phenotyping approach, we discovered functions for many novel TFs and revealed important secondary functions for known TFs. Surprisingly, many TFs are required for resisting multiple stressors, suggesting cross-regulation of stress responses. Through extensive validation experiments, we map the physiological roles of these novel TFs in stress response back to their position in the regulatory network wiring. This study advances understanding of the mechanisms underlying how microorganisms resist extreme stress. Given the generality of the methods employed, we expect that this study will enable future studies on how regulatory networks adjust cellular physiology in a diversity of organisms. Gene regulatory networks (GRNs) are critical for dynamic transcriptional responses to environmental stress. However, the mechanisms by which GRN regulation adjusts physiology to enable stress survival remain unclear. Here we investigate the functions of transcription factors (TFs) within the global GRN of the stress-tolerant archaeal microorganism Halobacterium salinarum. We measured growth phenotypes of a panel of TF deletion mutants in high temporal resolution under heat shock, oxidative stress, and low-salinity conditions. To quantitate the noncanonical functional forms of the growth trajectories observed for these mutants, we developed a novel modeling framework based on Gaussian process regression and functional analysis of variance (FANOVA). We employ unique statistical tests to determine the significance of differential growth relative to the growth of the control strain. This analysis recapitulated known TF functions, revealed novel functions, and identified surprising secondary functions for characterized TFs. Strikingly, we observed that the majority of the TFs studied were required for growth under multiple stress conditions, pinpointing regulatory connections between the conditions tested. Correlations between quantitative phenotype trajectories of mutants are predictive of TF-TF connections within the GRN. These phenotypes are strongly concordant with predictions from statistical GRN models inferred from gene expression data alone. With genome-wide and targeted data sets, we provide detailed functional validation of novel TFs required for extreme oxidative stress and heat shock survival. Together, results presented in this study suggest that many TFs function under multiple conditions, thereby revealing high interconnectivity within the GRN and identifying the specific TFs required for communication between networks responding to disparate stressors. IMPORTANCE To ensure survival in the face of stress, microorganisms employ inducible damage repair pathways regulated by extensive and complex gene networks. Many archaea, microorganisms of the third domain of life, persist under extremes of temperature, salinity, and pH and under other conditions. In order to understand the cause-effect relationships between the dynamic function of the stress network and ultimate physiological consequences, this study characterized the physiological role of nearly one-third of all regulatory proteins known as transcription factors (TFs) in an archaeal organism. Using a unique quantitative phenotyping approach, we discovered functions for many novel TFs and revealed important secondary functions for known TFs. Surprisingly, many TFs are required for resisting multiple stressors, suggesting cross-regulation of stress responses. Through extensive validation experiments, we map the physiological roles of these novel TFs in stress response back to their position in the regulatory network wiring. This study advances understanding of the mechanisms underlying how microorganisms resist extreme stress. Given the generality of the methods employed, we expect that this study will enable future studies on how regulatory networks adjust cellular physiology in a diversity of organisms.
Collapse
|
70
|
Nguyen LH, Holmes S. Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations. BMC Bioinformatics 2017; 18:394. [PMID: 28929970 PMCID: PMC5606221 DOI: 10.1186/s12859-017-1790-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous process, often unknown. Estimating data points' 'natural ordering' and their corresponding uncertainties can help researchers draw insights about the mechanisms involved. RESULTS We introduce a Bayesian Unidimensional Scaling (BUDS) technique which extracts dominant sources of variation in high dimensional datasets and produces their visual data summaries, facilitating the exploration of a hidden continuum. The method maps multivariate data points to latent one-dimensional coordinates along their underlying trajectory, and provides estimated uncertainty bounds. By statistically modeling dissimilarities and applying a DiSTATIS registration method to their posterior samples, we are able to incorporate visualizations of uncertainties in the estimated data trajectory across different regions using confidence contours for individual data points. We also illustrate the estimated overall data density across different areas by including density clouds. One-dimensional coordinates recovered by BUDS help researchers discover sample attributes or covariates that are factors driving the main variability in a dataset. We demonstrated usefulness and accuracy of BUDS on a set of published microbiome 16S and RNA-seq and roll call data. CONCLUSIONS Our method effectively recovers and visualizes natural orderings present in datasets. Automatic visualization tools for data exploration and analysis are available at: https://nlhuong.shinyapps.io/visTrajectory/ .
Collapse
Affiliation(s)
- Lan Huong Nguyen
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, 94305 USA
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, 94305 USA
| |
Collapse
|
71
|
Welch JD, Hartemink AJ, Prins JF. MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol 2017; 18:138. [PMID: 28738873 PMCID: PMC5525279 DOI: 10.1186/s13059-017-1269-0] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 07/05/2017] [Indexed: 12/30/2022] Open
Abstract
Single cell experimental techniques reveal transcriptomic and epigenetic heterogeneity among cells, but how these are related is unclear. We present MATCHER, an approach for integrating multiple types of single cell measurements. MATCHER uses manifold alignment to infer single cell multi-omic profiles from transcriptomic and epigenetic measurements performed on different cells of the same type. Using scM&T-seq and sc-GEM data, we confirm that MATCHER accurately predicts true single cell correlations between DNA methylation and gene expression without using known cell correspondences. MATCHER also reveals new insights into the dynamic interplay between the transcriptome and epigenome in single embryonic stem cells and induced pluripotent stem cells.
Collapse
Affiliation(s)
- Joshua D Welch
- Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Jan F Prins
- Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. .,Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
72
|
Campbell KR, Yau C. switchde: inference of switch-like differential expression along single-cell trajectories. Bioinformatics 2017; 33:1241-1242. [PMID: 28011787 PMCID: PMC5408844 DOI: 10.1093/bioinformatics/btw798] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 12/13/2016] [Indexed: 11/23/2022] Open
Abstract
Motivation Pseudotime analyses of single-cell RNA-seq data have become increasingly common. Typically, a latent trajectory corresponding to a biological process of interest—such as differentiation or cell cycle—is discovered. However, relatively little attention has been paid to modelling the differential expression of genes along such trajectories. Results We present switchde, a statistical framework and accompanying R package for identifying switch-like differential expression of genes along pseudotemporal trajectories. Our method includes fast model fitting that provides interpretable parameter estimates corresponding to how quickly a gene is up or down regulated as well as where in the trajectory such regulation occurs. It also reports a P-value in favour of rejecting a constant-expression model for switch-like differential expression and optionally models the zero-inflation prevalent in single-cell data. Availability and Implementation The R package switchde is available through the Bioconductor project at https://bioconductor.org/packages/switchde. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kieran R Campbell
- Department of Physiology, Anatomy and Genetics.,Wellcome Trust Centre for Human Genetics
| | - Christopher Yau
- Wellcome Trust Centre for Human Genetics.,Department of Statistics, University of Oxford, Oxford, UK
| |
Collapse
|
73
|
Zeng C, Mulas F, Sui Y, Guan T, Miller N, Tan Y, Liu F, Jin W, Carrano AC, Huising MO, Shirihai OS, Yeo GW, Sander M. Pseudotemporal Ordering of Single Cells Reveals Metabolic Control of Postnatal β Cell Proliferation. Cell Metab 2017; 25:1160-1175.e11. [PMID: 28467932 PMCID: PMC5501713 DOI: 10.1016/j.cmet.2017.04.014] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 02/28/2017] [Accepted: 04/13/2017] [Indexed: 01/28/2023]
Abstract
Pancreatic β cell mass for appropriate blood glucose control is established during early postnatal life. β cell proliferative capacity declines postnatally, but the extrinsic cues and intracellular signals that cause this decline remain unknown. To obtain a high-resolution map of β cell transcriptome dynamics after birth, we generated single-cell RNA-seq data of β cells from multiple postnatal time points and ordered cells based on transcriptional similarity using a new analytical tool. This analysis captured signatures of immature, proliferative β cells and established high expression of amino acid metabolic, mitochondrial, and Srf/Jun/Fos transcription factor genes as their hallmark feature. Experimental validation revealed high metabolic activity in immature β cells and a role for reactive oxygen species and Srf/Jun/Fos transcription factors in driving postnatal β cell proliferation and mass expansion. Our work provides the first high-resolution molecular characterization of state changes in postnatal β cells and paves the way for the identification of novel therapeutic targets to stimulate β cell regeneration.
Collapse
Affiliation(s)
- Chun Zeng
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Francesca Mulas
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Yinghui Sui
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Tiffany Guan
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Nathanael Miller
- Departments of Medicine and Molecular & Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA; Department of Medicine, Boston University, School of Medicine, Boston, MA 02118, USA
| | - Yuliang Tan
- Howard Hughes Medical Institute, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Fenfen Liu
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Wen Jin
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Andrea C Carrano
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mark O Huising
- Department of Neurobiology, Physiology & Behavior, College of Biological Sciences, University of California, Davis, CA 95616, USA
| | - Orian S Shirihai
- Departments of Medicine and Molecular & Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA; Department of Medicine, Boston University, School of Medicine, Boston, MA 02118, USA
| | - Gene W Yeo
- Department of Cellular & Molecular Medicine and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Maike Sander
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
74
|
Campbell KR, Yau C. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers. Wellcome Open Res 2017; 2:19. [PMID: 28503665 PMCID: PMC5428745 DOI: 10.12688/wellcomeopenres.11087.1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.
Collapse
Affiliation(s)
- Kieran R Campbell
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, OX1 3QX, UK.,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
| | - Christopher Yau
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK.,Centre for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| |
Collapse
|
75
|
Lönnberg T, Svensson V, James KR, Fernandez-Ruiz D, Sebina I, Montandon R, Soon MSF, Fogg LG, Nair AS, Liligeto U, Stubbington MJT, Ly LH, Bagger FO, Zwiessele M, Lawrence ND, Souza-Fonseca-Guimaraes F, Bunn PT, Engwerda CR, Heath WR, Billker O, Stegle O, Haque A, Teichmann SA. Single-cell RNA-seq and computational analysis using temporal mixture modelling resolves Th1/Tfh fate bifurcation in malaria. Sci Immunol 2017; 2:eaal2192. [PMID: 28345074 PMCID: PMC5365145 DOI: 10.1126/sciimmunol.aal2192] [Citation(s) in RCA: 206] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Differentiation of naïve CD4+ T cells into functionally distinct T helper subsets is crucial for the orchestration of immune responses. Due to extensive heterogeneity and multiple overlapping transcriptional programs in differentiating T cell populations, this process has remained a challenge for systematic dissection in vivo. By using single-cell transcriptomics and computational analysis using a temporal mixtures of Gaussian processes model, termed GPfates, we reconstructed the developmental trajectories of Th1 and Tfh cells during blood-stage Plasmodium infection in mice. By tracking clonality using endogenous TCR sequences, we first demonstrated that Th1/Tfh bifurcation had occurred at both population and single-clone levels. Next, we identified genes whose expression was associated with Th1 or Tfh fates, and demonstrated a T-cell intrinsic role for Galectin-1 in supporting a Th1 differentiation. We also revealed the close molecular relationship between Th1 and IL-10-producing Tr1 cells in this infection. Th1 and Tfh fates emerged from a highly proliferative precursor that upregulated aerobic glycolysis and accelerated cell cycling as cytokine expression began. Dynamic gene expression of chemokine receptors around bifurcation predicted roles for cell-cell in driving Th1/Tfh fates. In particular, we found that precursor Th cells were coached towards a Th1 but not a Tfh fate by inflammatory monocytes. Thus, by integrating genomic and computational approaches, our study has provided two unique resources, a database www.PlasmoTH.org, which facilitates discovery of novel factors controlling Th1/Tfh fate commitment, and more generally, GPfates, a modelling framework for characterizing cell differentiation towards multiple fates.
Collapse
Affiliation(s)
- Tapio Lönnberg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Valentine Svensson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kylie R. James
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, Australia
| | - Daniel Fernandez-Ruiz
- Department of Microbiology and Immunology, The Peter Doherty Institute, University of Melbourne, Parkville, Victoria, Australia
| | - Ismail Sebina
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, Australia
| | - Ruddy Montandon
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Megan S. F. Soon
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, Australia
| | - Lily G. Fogg
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, Australia
| | - Arya Sheela Nair
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, Australia
| | - Urijah Liligeto
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, Australia
| | - Michael J. T. Stubbington
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Lam-Ha Ly
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Frederik Otzen Bagger
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- National Health Service (NHS) Blood and Transplant, Cambridge Biomedical Campus, Long Road, Cambridge, UK
| | - Max Zwiessele
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Neil D. Lawrence
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | | | - Patrick T. Bunn
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, Australia
| | | | - William R. Heath
- Department of Microbiology and Immunology, The Peter Doherty Institute, University of Melbourne, Parkville, Victoria, Australia
- The Australian Research Council Centre of Excellence in Advanced Molecular Imaging, The University of Melbourne, Parkville, Victoria, Australia
| | - Oliver Billker
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ashraful Haque
- QIMR Berghofer Medical Research Institute, Herston, Brisbane, Queensland, Australia
| | - Sarah A. Teichmann
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
76
|
YIZHAR-BARNEA OFER, AVRAHAM KARENB. Single cell analysis of the inner ear sensory organs. THE INTERNATIONAL JOURNAL OF DEVELOPMENTAL BIOLOGY 2017; 61:205-213. [PMID: 28621418 PMCID: PMC5709810 DOI: 10.1387/ijdb.160453ka] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The inner ear is composed of a complex mixture of cells, which together allow organisms to hear and maintain balance. The cells in the inner ear, which undergo an extraordinary process of development, have only recently begun to be studied on an individual level. As it has recently become clear that individual cells, previously considered to be of uniform character, may differ dramatically from each other, the need to study cell-to-cell variation, along with distinct transcriptional and regulatory signatures, has taken hold in the scientific community. In conjunction with high-throughput technologies, attempts are underway to dissect the inter- and intra-cellular variability of different cell types and developmental states of the inner ear from a novel perspective. Single cell analysis of the inner ear sensory organs holds the promise of providing a significant boost in building an omics network that translates into a comprehensive understanding of the mechanisms of hearing and balance. These networks may uncover critical elements for trans-differentiation, regeneration and/or reprogramming, providing entry points for therapeutics of deafness and vestibular pathologies.
Collapse
Affiliation(s)
- OFER YIZHAR-BARNEA
- Department of Human Molecular Genetics and Biochemistry, Sackler
Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel
Aviv, Israel
| | - KAREN B. AVRAHAM
- Department of Human Molecular Genetics and Biochemistry, Sackler
Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel
Aviv, Israel
| |
Collapse
|
77
|
Campbell KR, Yau C. Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference. PLoS Comput Biol 2016; 12:e1005212. [PMID: 27870852 PMCID: PMC5117567 DOI: 10.1371/journal.pcbi.1005212] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 10/13/2016] [Indexed: 11/18/2022] Open
Abstract
Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference. Understanding the “cellular programming” that controls fundamental, dynamic biological processes is important for determining normal cellular function and potential perturbations that might give rise to physiological disorders. Ideally, investigations would employ time series experiments to periodically measure the properties of each cell. This would allow us to understand the sequence of gene (in)activations that constitute the program being followed. In practice, such experiments can be difficult to perform as cellular activity may be asynchronous with each cell occupying a different phase of the process of interested. Furthermore, the unbiased measurement of all transcripts or proteins requires the cells to be captured and lysed precluding the continued monitoring of that cell. In the absence of the ability to conduct true time series experiments, pseudotime algorithms exploit the asynchronous cellular nature of these systems to mathematically assign a “pseudotime” to each cell based on its molecular profile allowing the cells to be aligned and the sequence of gene activation events retrospectively inferred. Existing approaches predominantly use deterministic methods that ignore the statistical uncertainties associated with the problem. This paper demonstrates that this statistical uncertainty limits the temporal resolution that can be extracted from static snapshots of cell expression profiles and can also detrimentally affect downstream analysis.
Collapse
Affiliation(s)
- Kieran R. Campbell
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Christopher Yau
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
78
|
Mo ZQ, Yang M, Wang HQ, Xu Y, Huang MZ, Lao GF, Li YW, Li AX, Luo XC, Dan XM. Grouper (Epinephelus coioides) BCR signaling pathway was involved in response against Cryptocaryon irritans infection. FISH & SHELLFISH IMMUNOLOGY 2016; 57:198-205. [PMID: 27514788 DOI: 10.1016/j.fsi.2016.08.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 07/23/2016] [Accepted: 08/07/2016] [Indexed: 06/06/2023]
Abstract
B cell antigen receptor (BCR) plays a crucial role in B cell development and antibody production. It comprises membrane immunoglobulin non-covalently associated with CD79a/CD79b heterodimer. After B cell activation, initial extracellular signals are transduced by BCR complex and amplified by two protein tyrosine kinases, LYN and SYK, which then trigger various pathways. In the present study, we cloned grouper genes for BCR accessory molecules, EcCD79a (669 bp) and EcCD79b (639 bp), as well as two protein tyrosine kinases, EcLYN (1482 bp) and EcSYK (1854 bp). Homology analysis showed that all four molecules had a relatively high amino acid identity compared with those in other animals. Among them, they all shared the highest identity with Takifugu rubripes (EcCD79a 49%, EcCD79b 52%, EcLYN 82% and EcSYK 77%). The conserved features and important functional residues were analyzed. Together with IgM and IgT, tissue distribution analysis showed that all six molecules were mainly expressed in immune organs, particularly systematic immune organs. In groupers infected with Cryptocaryon irritans, up-regulation of EcCD79a and b, EcIgM and EcIgT were not seen in the early stage skin and gill until 14-21 days. Up-regulation of EcCD79a was seen in head kidney at most time points, while EcCD79a and b were only significantly up-regulated in day 14 spleen. Significant up-regulation of EcIgT were seen in day 21 head kidney and day 1, day14 spleen. Significant up-regulation of EcIgM were seen in day 1 head kidney and 12 h spleen. In addition, two protein kinase genes, EcLYN and EcSYK, were up-regulated in the skin at most time points, which suggested that B cells may be activated at the skin local infection site.
Collapse
Affiliation(s)
- Ze-Quan Mo
- College of Marine Sciences, South China Agricultural University, Guangzhou 510642, Guangdong Province, PR China
| | - Man Yang
- College of Marine Sciences, South China Agricultural University, Guangzhou 510642, Guangdong Province, PR China
| | - Hai-Qing Wang
- College of Marine Sciences, South China Agricultural University, Guangzhou 510642, Guangdong Province, PR China
| | - Yang Xu
- School of Bioscience and Biotechnology, South China University of Technology, Guangzhou 510006, PR China
| | - Mian-Zhi Huang
- College of Marine Sciences, South China Agricultural University, Guangzhou 510642, Guangdong Province, PR China
| | - Guo-Feng Lao
- College of Marine Sciences, South China Agricultural University, Guangzhou 510642, Guangdong Province, PR China
| | - Yan-Wei Li
- College of Marine Sciences, South China Agricultural University, Guangzhou 510642, Guangdong Province, PR China; Guangdong Provincial Key Laboratory of Import and Export Technical Measures of Animal, Plant and Food, Technical Center of Guangdong Entry-Exit Inspection and Quarantine Bureau, Guangzhou 510623, Guangdong Province, PR China
| | - An-Xing Li
- State Key Laboratory of Biocontrol/Key Laboratory of Aquatic Product Safety (Sun Yat-Sen University), Ministry of Education, The School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, Guangdong Province, PR China
| | - Xiao-Chun Luo
- School of Bioscience and Biotechnology, South China University of Technology, Guangzhou 510006, PR China.
| | - Xue-Ming Dan
- College of Marine Sciences, South China Agricultural University, Guangzhou 510642, Guangdong Province, PR China.
| |
Collapse
|
79
|
duVerle DA, Yotsukura S, Nomura S, Aburatani H, Tsuda K. CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data. BMC Bioinformatics 2016; 17:363. [PMID: 27620863 PMCID: PMC5020541 DOI: 10.1186/s12859-016-1175-6] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 08/11/2016] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Single-cell RNA sequencing is fast becoming one the standard method for gene expression measurement, providing unique insights into cellular processes. A number of methods, based on general dimensionality reduction techniques, have been suggested to help infer and visualise the underlying structure of cell populations from single-cell expression levels, yet their models generally lack proper biological grounding and struggle at identifying complex differentiation paths. RESULTS Here we introduce cellTree: an R/Bioconductor package that uses a novel statistical approach, based on document analysis techniques, to produce tree structures outlining the hierarchical relationship between single-cell samples, while identifying latent groups of genes that can provide biological insights. CONCLUSIONS With cellTree, we provide experimentalists with an easy-to-use tool, based on statistically and biologically-sound algorithms, to efficiently explore and visualise single-cell RNA data. The cellTree package is publicly available in the online Bionconductor repository at: http://bioconductor.org/packages/cellTree/ .
Collapse
Affiliation(s)
- David A duVerle
- Graduate School of Frontier Sciences at the University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Japan. .,Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo, Japan.
| | - Sohiya Yotsukura
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan
| | - Seitaro Nomura
- Genome Science Division, Laboratory of Systems Biology and Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Tokyo, Japan
| | - Hiroyuki Aburatani
- Genome Science Division, Laboratory of Systems Biology and Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Tokyo, Japan
| | - Koji Tsuda
- Graduate School of Frontier Sciences at the University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Japan. .,Center for Materials Research by Information Integration, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Japan. .,Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo, Japan.
| |
Collapse
|
80
|
|