1
|
Guo Y, Zou H, Alam MS, Luo S. Integrative Multi-Omics and Multivariate Longitudinal Data Analysis for Dynamic Risk Estimation in Alzheimer's Disease. Stat Med 2025; 44:e70105. [PMID: 40387018 DOI: 10.1002/sim.70105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/19/2025] [Accepted: 04/11/2025] [Indexed: 05/20/2025]
Abstract
Alzheimer's disease (AD) is a complex and progressive neurodegenerative disorder, characterized by diverse cognitive and functional impairments that manifest heterogeneously across individuals, domains, and time. The accurate assessment of AD's severity and progression requires integrating a variety of data modalities, including multivariate longitudinal neuropsychological tests and multi-omics datasets such as metabolomics and lipidomics. These data sources provide valuable insights into risk factors associated with dementia onset. However, effectively utilizing omics data in dynamic risk estimation for AD progression is challenging due to issues including high dimensionality, heterogeneity, and complex intercorrelations. To address these challenges, we develop a novel joint-modeling framework that effectively combines multi-omics factor analysis (MOFA) for dimension reduction and feature extraction with a multivariate functional mixed model (MFMM) for modeling longitudinal outcomes. This integrative joint modeling approach enables dynamic evaluation of dementia risk by leveraging both omics and longitudinal data. We validate the efficacy of our integrative model through extensive simulation studies and its practical application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset.
Collapse
Affiliation(s)
- Yuanyuan Guo
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Haotian Zou
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Mohammad Samsul Alam
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Sheng Luo
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
2
|
Lee CY, Clatworthy MR, Withers DR. Decoding changes in tumor-infiltrating leukocytes through dynamic experimental models and single-cell technologies. Immunol Cell Biol 2024; 102:665-679. [PMID: 38853634 DOI: 10.1111/imcb.12787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/13/2024] [Accepted: 05/13/2024] [Indexed: 06/11/2024]
Abstract
The ability to characterize immune cells and explore the molecular interactions that govern their functions has never been greater, fueled in recent years by the revolutionary advance of single-cell analysis platforms. However, precisely how immune cells respond to different stimuli and where differentiation processes and effector functions operate remain incompletely understood. Inferring cellular fate within single-cell transcriptomic analyses is now omnipresent, despite the assumptions typically required in such analyses. Recently developed experimental models support dynamic analyses of the immune response, providing insights into the temporal changes that occur within cells and the tissues in which such transitions occur. Here we will review these approaches and discuss how these can be combined with single-cell technologies to develop a deeper understanding of the immune responses that should support the development of better therapeutic options for patients.
Collapse
Affiliation(s)
- Colin Yc Lee
- Cambridge Institute of Therapeutic Immunology and Infection Disease, University of Cambridge, Cambridge, UK
| | - Menna R Clatworthy
- Cambridge Institute of Therapeutic Immunology and Infection Disease, University of Cambridge, Cambridge, UK
| | - David R Withers
- Institute of Immunology and Immunotherapy, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| |
Collapse
|
3
|
Erfanian N, Heydari AA, Feriz AM, Iañez P, Derakhshani A, Ghasemigol M, Farahpour M, Razavi SM, Nasseri S, Safarpour H, Sahebkar A. Deep learning applications in single-cell genomics and transcriptomics data analysis. Biomed Pharmacother 2023; 165:115077. [PMID: 37393865 DOI: 10.1016/j.biopha.2023.115077] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 06/22/2023] [Accepted: 06/23/2023] [Indexed: 07/04/2023] Open
Abstract
Traditional bulk sequencing methods are limited to measuring the average signal in a group of cells, potentially masking heterogeneity, and rare populations. The single-cell resolution, however, enhances our understanding of complex biological systems and diseases, such as cancer, the immune system, and chronic diseases. However, the single-cell technologies generate massive amounts of data that are often high-dimensional, sparse, and complex, thus making analysis with traditional computational approaches difficult and unfeasible. To tackle these challenges, many are turning to deep learning (DL) methods as potential alternatives to the conventional machine learning (ML) algorithms for single-cell studies. DL is a branch of ML capable of extracting high-level features from raw inputs in multiple stages. Compared to traditional ML, DL models have provided significant improvements across many domains and applications. In this work, we examine DL applications in genomics, transcriptomics, spatial transcriptomics, and multi-omics integration, and address whether DL techniques will prove to be advantageous or if the single-cell omics domain poses unique challenges. Through a systematic literature review, we have found that DL has not yet revolutionized the most pressing challenges of the single-cell omics field. However, using DL models for single-cell omics has shown promising results (in many cases outperforming the previous state-of-the-art models) in data preprocessing and downstream analysis. Although developments of DL algorithms for single-cell omics have generally been gradual, recent advances reveal that DL can offer valuable resources in fast-tracking and advancing research in single-cell.
Collapse
Affiliation(s)
- Nafiseh Erfanian
- Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran
| | - A Ali Heydari
- Department of Applied Mathematics, University of California, Merced, CA, USA; Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Adib Miraki Feriz
- Student Research Committee, Birjand University of Medical Sciences, Birjand, Iran
| | - Pablo Iañez
- Cellular Systems Genomics Group, Josep Carreras Research Institute, Barcelona, Spain
| | - Afshin Derakhshani
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada
| | | | - Mohsen Farahpour
- Department of Electronics, Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Seyyed Mohammad Razavi
- Department of Electronics, Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran
| | - Saeed Nasseri
- Cellular and Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | - Hossein Safarpour
- Cellular and Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran.
| | - Amirhossein Sahebkar
- Biotechnology Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran; Applied Biomedical Research Center, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Biotechnology, School of Pharmacy, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
4
|
Gunawardena R, Sarrigiannis PG, Blackburn DJ, He F. Kernel-based Nonlinear Manifold Learning for EEG-based Functional Connectivity Analysis and Channel Selection with Application to Alzheimer's Disease. Neuroscience 2023:S0306-4522(23)00253-1. [PMID: 37301505 DOI: 10.1016/j.neuroscience.2023.05.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 05/15/2023] [Accepted: 05/29/2023] [Indexed: 06/12/2023]
Abstract
Dynamical, causal, and cross-frequency coupling analysis using the electroencephalogram (EEG) has gained significant attention for diagnosing and characterizing neurological disorders. Selecting important EEG channels is crucial for reducing computational complexity in implementing these methods and improving classification accuracy. In neuroscience, measures of (dis)similarity between EEG channels are often used as functional connectivity (FC) features, and important channels are selected via feature selection. Developing a generic measure of (dis)similarity is important for FC analysis and channel selection. In this study, learning of (dis)similarity information within the EEG is achieved using kernel-based nonlinear manifold learning. The focus is on FC changes and, thereby, EEG channel selection. Isomap and Gaussian Process Latent Variable Model (Isomap-GPLVM) are employed for this purpose. The resulting kernel (dis)similarity matrix is used as a novel measure of linear and nonlinear FC between EEG channels. The analysis of EEG from healthy controls (HC) and patients with mild to moderate Alzheimer's disease (AD) are presented as a case study. Classification results are compared with other commonly used FC measures. Our analysis shows significant differences in FC between bipolar channels of the occipital region and other regions (i.e. parietal, centro-parietal, and fronto-central) between AD and HC groups. Furthermore, our results indicate that FC changes between channels along the fronto-parietal region and the rest of the EEG are important in diagnosing AD. Our results and its relation to functional networks are consistent with those obtained from previous studies using fMRI, resting-state fMRI and EEG.
Collapse
Affiliation(s)
- Rajintha Gunawardena
- Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry, CV1 5FB, UK
| | | | - Daniel J Blackburn
- Department of Neuroscience, The University of Sheffield, Sheffield, S10 2HQ, UK
| | - Fei He
- Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry, CV1 5FB, UK.
| |
Collapse
|
5
|
Palou-Márquez G, Subirana I, Nonell L, Fernández-Sanlés A, Elosua R. DNA methylation and gene expression integration in cardiovascular disease. Clin Epigenetics 2021; 13:75. [PMID: 33836805 PMCID: PMC8034168 DOI: 10.1186/s13148-021-01064-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 03/29/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The integration of different layers of omics information is an opportunity to tackle the complexity of cardiovascular diseases (CVD) and to identify new predictive biomarkers and potential therapeutic targets. Our aim was to integrate DNA methylation and gene expression data in an effort to identify biomarkers related to cardiovascular disease risk in a community-based population. We accessed data from the Framingham Offspring Study, a cohort study with data on DNA methylation (Infinium HumanMethylation450 BeadChip; Illumina) and gene expression (Human Exon 1.0 ST Array; Affymetrix). Using the MOFA2 R package, we integrated these data to identify biomarkers related to the risk of presenting a cardiovascular event. RESULTS Four independent latent factors (9, 19, 21-only in women-and 27), driven by DNA methylation, were associated with cardiovascular disease independently of classical risk factors and cell-type counts. In a sensitivity analysis, we also identified factor 21 as associated with CVD in women. Factors 9, 21 and 27 were also associated with coronary heart disease risk. Moreover, in a replication effort in an independent study three of the genes included in factor 27 were also present in a factor identified to be associated with myocardial infarction (CDC42BPB, MAN2A2 and RPTOR). Factor 9 was related to age and cell-type proportions; factor 19 was related to age and B cells count; factor 21 pointed to human immunodeficiency virus infection-related pathways and inflammation; and factor 27 was related to lifestyle factors such as alcohol consumption, smoking and body mass index. Inclusion of factor 21 (only in women) improved the discriminative and reclassification capacity of the Framingham classical risk function and factor 27 improved its discrimination. CONCLUSIONS Unsupervised multi-omics data integration methods have the potential to provide insights into the pathogenesis of cardiovascular diseases. We identified four independent factors (one only in women) pointing to inflammation, endothelium homeostasis, visceral fat, cardiac remodeling and lifestyles as key players in the determination of cardiovascular risk. Moreover, two of these factors improved the predictive capacity of a classical risk function.
Collapse
Affiliation(s)
- Guillermo Palou-Márquez
- Cardiovascular Epidemiology and Genetics Research Group, Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003, Barcelona, Spain
- Pompeu Fabra University (UPF), Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Isaac Subirana
- Cardiovascular Epidemiology and Genetics Research Group, Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003, Barcelona, Spain
- CIBER Epidemiology and Public Health (CIBERESP), Barcelona, Spain
| | - Lara Nonell
- MARGenomics, Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Alba Fernández-Sanlés
- Cardiovascular Epidemiology and Genetics Research Group, Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003, Barcelona, Spain
- MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK
| | - Roberto Elosua
- Cardiovascular Epidemiology and Genetics Research Group, Hospital del Mar Medical Research Institute (IMIM), Dr Aiguader 88, 08003, Barcelona, Spain.
- CIBER Cardiovascular Diseases (CIBERCV), Barcelona, Spain.
- Medicine Department, Faculty of Medicine, University of Vic-Central University of Catalonia (UVic-UCC), Vic, Spain.
| |
Collapse
|
6
|
Kopf A, Claassen M. Latent representation learning in biology and translational medicine. PATTERNS (NEW YORK, N.Y.) 2021; 2:100198. [PMID: 33748792 PMCID: PMC7961186 DOI: 10.1016/j.patter.2021.100198] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Current data generation capabilities in the life sciences render scientists in an apparently contradicting situation. While it is possible to simultaneously measure an ever-increasing number of systems parameters, the resulting data are becoming increasingly difficult to interpret. Latent variable modeling allows for such interpretation by learning non-measurable hidden variables from observations. This review gives an overview over the different formal approaches to latent variable modeling, as well as applications at different scales of biological systems, such as molecular structures, intra- and intercellular regulatory up to physiological networks. The focus is on demonstrating how these approaches have enabled interpretable representations and ultimately insights in each of these domains. We anticipate that a wider dissemination of latent variable modeling in the life sciences will enable a more effective and productive interpretation of studies based on heterogeneous and high-dimensional data modalities.
Collapse
Affiliation(s)
- Andreas Kopf
- Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland
| | - Manfred Claassen
- Division of Clinical Bioinformatics, Department of Internal Medicine I, University Hospital Tübingen, 72076 Tübingen, Germany
- Computer Science Department, Eberhard Karls University of Tübingen, 72076 Tübingen, Germany
- Cluster of Excellence Machine Learning (EXC 2064), Eberhard Karls University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
7
|
Abstract
AbstractThe analysis of single-cell RNA sequencing data is of great importance in health research. It challenges data scientists, but has enormous potential in the context of personalized medicine. The clustering of single cells aims to detect different subgroups of cell populations within a patient in a data-driven manner. Some comparison studies denote single-cell consensus clustering (SC3), proposed by Kiselev et al. (Nat Methods 14(5):483–486, 2017), as the best method for classifying single-cell RNA sequencing data. SC3 includes Laplacian eigenmaps and a principal component analysis (PCA). Our proposal of unsupervised adapted single-cell consensus clustering (adaSC3) suggests to replace the linear PCA by diffusion maps, a non-linear method that takes the transition of single cells into account. We investigate the performance of adaSC3 in terms of accuracy on the data sets of the original source of SC3 as well as in a simulation study. A comparison of adaSC3 with SC3 as well as with related algorithms based on further alternative dimension reduction techniques shows a quite convincing behavior of adaSC3.
Collapse
|
8
|
Bergenstråhle J, Bergenstråhle L, Lundeberg J. SpatialCPie: an R/Bioconductor package for spatial transcriptomics cluster evaluation. BMC Bioinformatics 2020; 21:161. [PMID: 32349652 PMCID: PMC7191678 DOI: 10.1186/s12859-020-3489-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 04/13/2020] [Indexed: 11/28/2022] Open
Abstract
Background Technological developments in the emerging field of spatial transcriptomics have opened up an unexplored landscape where transcript information is put in a spatial context. Clustering commonly constitutes a central component in analyzing this type of data. However, deciding on the number of clusters to use and interpreting their relationships can be difficult. Results We introduce SpatialCPie, an R package designed to facilitate cluster evaluation for spatial transcriptomics data. SpatialCPie clusters the data at multiple resolutions. The results are visualized with pie charts that indicate the similarity between spatial regions and clusters and a cluster graph that shows the relationships between clusters at different resolutions. We demonstrate SpatialCPie on several publicly available datasets. Conclusions SpatialCPie provides intuitive visualizations of cluster relationships when dealing with Spatial Transcriptomics data.
Collapse
Affiliation(s)
- Joseph Bergenstråhle
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Ludvig Bergenstråhle
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Joakim Lundeberg
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden.,Department of Bioengineering, Stanford University, California, USA
| |
Collapse
|
9
|
Ko ME, Williams CM, Fread KI, Goggin SM, Rustagi RS, Fragiadakis GK, Nolan GP, Zunder ER. FLOW-MAP: a graph-based, force-directed layout algorithm for trajectory mapping in single-cell time course datasets. Nat Protoc 2020; 15:398-420. [PMID: 31932774 PMCID: PMC7897424 DOI: 10.1038/s41596-019-0246-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 08/29/2019] [Indexed: 12/12/2022]
Abstract
High-dimensional single-cell technologies present new opportunities for biological discovery, but the complex nature of the resulting datasets makes it challenging to perform comprehensive analysis. One particular challenge is the analysis of single-cell time course datasets: how to identify unique cell populations and track how they change across time points. To facilitate this analysis, we developed FLOW-MAP, a graphical user interface (GUI)-based software tool that uses graph layout analysis with sequential time ordering to visualize cellular trajectories in high-dimensional single-cell datasets obtained from flow cytometry, mass cytometry or single-cell RNA sequencing (scRNAseq) experiments. Here we provide a detailed description of the FLOW-MAP algorithm and how to use the open-source R package FLOWMAPR via its GUI or with text-based commands. This approach can be applied to many dynamic processes, including in vitro stem cell differentiation, in vivo development, oncogenesis, the emergence of drug resistance and cell signaling dynamics. To demonstrate our approach, we perform a step-by-step analysis of a single-cell mass cytometry time course dataset from mouse embryonic stem cells differentiating into the three germ layers: endoderm, mesoderm and ectoderm. In addition, we demonstrate FLOW-MAP analysis of a previously published scRNAseq dataset. Using both synthetic and experimental datasets for comparison, we perform FLOW-MAP analysis side by side with other single-cell analysis methods, to illustrate when it is advantageous to use the FLOW-MAP approach. The protocol takes between 30 min and 1.5 h to complete.
Collapse
Affiliation(s)
- Melissa E Ko
- Cancer Biology Program, Stanford School of Medicine, Stanford, CA, USA
| | - Corey M Williams
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
- Robert M. Berne Cardiovascular Research Center, University of Virginia, Charlottesville, VA, USA
| | - Kristen I Fread
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| | - Sarah M Goggin
- Neuroscience Graduate Program, University of Virginia, Charlottesville, VA, USA
| | - Rohit S Rustagi
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| | | | - Garry P Nolan
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Eli R Zunder
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA.
| |
Collapse
|
10
|
Lange C, Rost F, Machate A, Reinhardt S, Lesche M, Weber A, Kuscha V, Dahl A, Rulands S, Brand M. Single cell sequencing of radial glia progeny reveals the diversity of newborn neurons in the adult zebrafish brain. Development 2020; 147:dev.185595. [PMID: 31908317 PMCID: PMC6983714 DOI: 10.1242/dev.185595] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 11/11/2019] [Indexed: 01/16/2023]
Abstract
Zebrafish display widespread and pronounced adult neurogenesis, which is fundamental for their regeneration capability after central nervous system injury. However, the cellular identity and the biological properties of adult newborn neurons are elusive for most brain areas. Here, we have used short-term lineage tracing of radial glia progeny to prospectively isolate newborn neurons from the her4.1+ radial glia lineage in the homeostatic adult forebrain. Transcriptome analysis of radial glia, newborn neurons and mature neurons using single cell sequencing identified distinct transcriptional profiles, including novel markers for each population. Specifically, we detected two separate newborn neuron types, which showed diversity of cell fate commitment and location. Further analyses showed that these cell types are homologous to neurogenic cells in the mammalian brain, identified neurogenic commitment in proliferating radial glia and indicated that glutamatergic projection neurons are generated in the adult zebrafish telencephalon. Thus, we prospectively isolated adult newborn neurons from the adult zebrafish forebrain, identified markers for newborn and mature neurons in the adult brain, and revealed intrinsic heterogeneity among adult newborn neurons and their homology with mammalian adult neurogenic cell types.
Collapse
Affiliation(s)
- Christian Lange
- Center for Regenerative Therapies Dresden (CRTD), CMCB, Technische Universität Dresden, Fetscherstrasse 105, 01307 Dresden, Germany
| | - Fabian Rost
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Strasse 38, 01187 Dresden, Germany.,Center for Systems Biology Dresden (CSBD), Pfotenhauer Strasse 108, 01307 Dresden, Germany
| | - Anja Machate
- Center for Regenerative Therapies Dresden (CRTD), CMCB, Technische Universität Dresden, Fetscherstrasse 105, 01307 Dresden, Germany
| | - Susanne Reinhardt
- Center for Regenerative Therapies Dresden (CRTD), CMCB, Technische Universität Dresden, Fetscherstrasse 105, 01307 Dresden, Germany.,DRESDEN-concept Genome Center, c/o Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetscherstrasse 105, 01307, Dresden, Germany
| | - Matthias Lesche
- Center for Regenerative Therapies Dresden (CRTD), CMCB, Technische Universität Dresden, Fetscherstrasse 105, 01307 Dresden, Germany.,DRESDEN-concept Genome Center, c/o Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetscherstrasse 105, 01307, Dresden, Germany
| | - Anke Weber
- Center for Regenerative Therapies Dresden (CRTD), CMCB, Technische Universität Dresden, Fetscherstrasse 105, 01307 Dresden, Germany
| | - Veronika Kuscha
- Center for Regenerative Therapies Dresden (CRTD), CMCB, Technische Universität Dresden, Fetscherstrasse 105, 01307 Dresden, Germany
| | - Andreas Dahl
- Center for Regenerative Therapies Dresden (CRTD), CMCB, Technische Universität Dresden, Fetscherstrasse 105, 01307 Dresden, Germany.,DRESDEN-concept Genome Center, c/o Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetscherstrasse 105, 01307, Dresden, Germany
| | - Steffen Rulands
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Strasse 38, 01187 Dresden, Germany.,Center for Systems Biology Dresden (CSBD), Pfotenhauer Strasse 108, 01307 Dresden, Germany
| | - Michael Brand
- Center for Regenerative Therapies Dresden (CRTD), CMCB, Technische Universität Dresden, Fetscherstrasse 105, 01307 Dresden, Germany
| |
Collapse
|
11
|
|
12
|
Tritschler S, Büttner M, Fischer DS, Lange M, Bergen V, Lickert H, Theis FJ. Concepts and limitations for learning developmental trajectories from single cell genomics. Development 2019; 146. [DOI: 10.1242/dev.170506] [Citation(s) in RCA: 132] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Abstract
ABSTRACT
Single cell genomics has become a popular approach to uncover the cellular heterogeneity of progenitor and terminally differentiated cell types with great precision. This approach can also delineate lineage hierarchies and identify molecular programmes of cell-fate acquisition and segregation. Nowadays, tens of thousands of cells are routinely sequenced in single cell-based methods and even more are expected to be analysed in the future. However, interpretation of the resulting data is challenging and requires computational models at multiple levels of abstraction. In contrast to other applications of single cell sequencing, where clustering approaches dominate, developmental systems are generally modelled using continuous structures, trajectories and trees. These trajectory models carry the promise of elucidating mechanisms of development, disease and stimulation response at very high molecular resolution. However, their reliable analysis and biological interpretation requires an understanding of their underlying assumptions and limitations. Here, we review the basic concepts of such computational approaches and discuss the characteristics of developmental processes that can be learnt from trajectory models.
Collapse
Affiliation(s)
- Sophie Tritschler
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85353 Freising, Germany
| | - Maren Büttner
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - David S. Fischer
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85353 Freising, Germany
| | - Marius Lange
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Volker Bergen
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Heiko Lickert
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- German Center for Diabetes Research, 85764 Neuherberg, Germany
- Institute of Stem Cell Research, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
- Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| |
Collapse
|
13
|
Understanding the hidden relations between pro- and anti-inflammatory cytokine genes in bovine oviduct epithelium using a multilayer response surface method. Sci Rep 2019; 9:3189. [PMID: 30816156 PMCID: PMC6395797 DOI: 10.1038/s41598-019-39081-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 01/18/2019] [Indexed: 02/06/2023] Open
Abstract
An understanding gene-gene interaction helps users to design the next experiments efficiently and (if applicable) to make a better decision of drugs application based on the different biological conditions of the patients. This study aimed to identify changes in the hidden relationships between pro- and anti-inflammatory cytokine genes in the bovine oviduct epithelial cells (BOECs) under various experimental conditions using a multilayer response surface method. It was noted that under physiological conditions (BOECs with sperm or sex hormones, such as ovarian sex steroids and LH), the mRNA expressions of IL10, IL1B, TNFA, TLR4, and TNFA were associated with IL1B, TNFA, TLR4, IL4, and IL10, respectively. Under pathophysiological + physiological conditions (BOECs with lipopolysaccharide + hormones, alpha-1-acid glycoprotein + hormones, zearalenone + hormones, or urea + hormones), the relationship among genes was changed. For example, the expression of IL10 and TNFA was associated with (IL1B, TNFA, or IL4) and TLR4 expression, respectively. Furthermore, under physiological conditions, the co-expression of IL10 + TNFA, TLR4 + IL4, TNFA + IL4, TNFA + IL4, or IL10 + IL1B and under pathophysiological + physiological conditions, the co-expression of IL10 + IL4, IL4 + IL10, TNFA + IL10, TNFA + TLR4, or IL10 + IL1B were associated with IL1B, TNFA, TLR4, IL10, or IL4 expression, respectively. Collectively, the relationships between pro- and anti-inflammatory cytokine genes can be changed with respect to the presence/absence of toxins, sex hormones, sperm, and co-expression of other gene pairs in BOECs, suggesting that considerable cautions are needed in interpreting the results obtained from such narrowly focused in vitro studies.
Collapse
|
14
|
Tischler J, Gruhn WH, Reid J, Allgeyer E, Buettner F, Marr C, Theis F, Simons BD, Wernisch L, Surani MA. Metabolic regulation of pluripotency and germ cell fate through α-ketoglutarate. EMBO J 2019; 38:e99518. [PMID: 30257965 PMCID: PMC6315289 DOI: 10.15252/embj.201899518] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 08/24/2018] [Accepted: 08/27/2018] [Indexed: 12/16/2022] Open
Abstract
An intricate link is becoming apparent between metabolism and cellular identities. Here, we explore the basis for such a link in an in vitro model for early mouse embryonic development: from naïve pluripotency to the specification of primordial germ cells (PGCs). Using single-cell RNA-seq with statistical modelling and modulation of energy metabolism, we demonstrate a functional role for oxidative mitochondrial metabolism in naïve pluripotency. We link mitochondrial tricarboxylic acid cycle activity to IDH2-mediated production of alpha-ketoglutarate and through it, the activity of key epigenetic regulators. Accordingly, this metabolite has a role in the maintenance of naïve pluripotency as well as in PGC differentiation, likely through preserving a particular histone methylation status underlying the transient state of developmental competence for the PGC fate. We reveal a link between energy metabolism and epigenetic control of cell state transitions during a developmental trajectory towards germ cell specification, and establish a paradigm for stabilizing fleeting cellular states through metabolic modulation.
Collapse
Affiliation(s)
- Julia Tischler
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Wolfram H Gruhn
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
| | - John Reid
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- The Alan Turing Institute, British Library, London, UK
| | - Edward Allgeyer
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Florian Buettner
- Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany
| | - Carsten Marr
- Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany
| | - Fabian Theis
- Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany
- Department of Mathematics, Chair of Mathematical Modeling of Biological Systems Technische Universität München, Garching, Germany
| | - Ben D Simons
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
| | - Lorenz Wernisch
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - M Azim Surani
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, UK
| |
Collapse
|
15
|
Ahmed S, Rattray M, Boukouvalas A. GrandPrix: scaling up the Bayesian GPLVM for single-cell data. Bioinformatics 2019; 35:47-54. [PMID: 30561544 PMCID: PMC6298059 DOI: 10.1093/bioinformatics/bty533] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Revised: 05/02/2018] [Accepted: 06/28/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation The Gaussian Process Latent Variable Model (GPLVM) is a popular approach for dimensionality reduction of single-cell data and has been used for pseudotime estimation with capture time information. However, current implementations are computationally intensive and will not scale up to modern droplet-based single-cell datasets which routinely profile many tens of thousands of cells. Results We provide an efficient implementation which allows scaling up this approach to modern single-cell datasets. We also generalize the application of pseudotime inference to cases where there are other sources of variation such as branching dynamics. We apply our method on microarray, nCounter, RNA-seq, qPCR and droplet-based datasets from different organisms. The model converges an order of magnitude faster compared to existing methods whilst achieving similar levels of estimation accuracy. Further, we demonstrate the flexibility of our approach by extending the model to higher-dimensional latent spaces that can be used to simultaneously infer pseudotime and other structure such as branching. Thus, the model has the capability of producing meaningful biological insights about cell ordering as well as cell fate regulation. Availability and implementation Software available at github.com/ManchesterBioinference/GrandPrix. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sumon Ahmed
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Magnus Rattray
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Alexis Boukouvalas
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| |
Collapse
|
16
|
Kolodziejczyk AA, Lönnberg T. Global and targeted approaches to single-cell transcriptome characterization. Brief Funct Genomics 2018; 17:209-219. [PMID: 29028866 PMCID: PMC6063303 DOI: 10.1093/bfgp/elx025] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Analysing transcriptomes of cell populations is a standard molecular biology approach to understand how cells function. Recent methodological development has allowed performing similar experiments on single cells. This has opened up the possibility to examine samples with limited cell number, such as cells of the early embryo, and to obtain an understanding of heterogeneity within populations such as blood cell types or neurons. There are two major approaches for single-cell transcriptome analysis: quantitative reverse transcription PCR (RT-qPCR) on a limited number of genes of interest, or more global approaches targeting entire transcriptomes using RNA sequencing. RT-qPCR is sensitive, fast and arguably more straightforward, while whole-transcriptome approaches offer an unbiased perspective on a cell's expression status.
Collapse
Affiliation(s)
| | - Tapio Lönnberg
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- EMBL-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
17
|
Argelaguet R, Velten B, Arnol D, Dietrich S, Zenz T, Marioni JC, Buettner F, Huber W, Stegle O. Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol 2018; 14:e8124. [PMID: 29925568 PMCID: PMC6010767 DOI: 10.15252/msb.20178124] [Citation(s) in RCA: 604] [Impact Index Per Article: 86.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Revised: 05/28/2018] [Accepted: 05/29/2018] [Indexed: 12/19/2022] Open
Abstract
Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
Collapse
Affiliation(s)
- Ricard Argelaguet
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Britta Velten
- European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Damien Arnol
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | | | - Thorsten Zenz
- Heidelberg University Hospital, Heidelberg, Germany
- German Cancer Research Center (dkfz) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Germany & Hematology, University Hospital Zurich and University of Zurich, Zurich, Switzerland
| | - John C Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Florian Buettner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- Helmholtz Zentrum München-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - Wolfgang Huber
- European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| |
Collapse
|
18
|
Ellwanger DC, Scheibinger M, Dumont RA, Barr-Gillespie PG, Heller S. Transcriptional Dynamics of Hair-Bundle Morphogenesis Revealed with CellTrails. Cell Rep 2018; 23:2901-2914.e13. [PMID: 29874578 PMCID: PMC6089258 DOI: 10.1016/j.celrep.2018.05.002] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 03/19/2018] [Accepted: 05/01/2018] [Indexed: 11/30/2022] Open
Abstract
Protruding from the apical surface of inner ear sensory cells, hair bundles carry out mechanotransduction. Bundle growth involves sequential and overlapping cellular processes, which are concealed within gene expression profiles of individual cells. To dissect such processes, we developed CellTrails, a tool for uncovering, analyzing, and visualizing single-cell gene-expression dynamics. Utilizing quantitative gene-expression data for key bundle proteins from single cells of the developing chick utricle, we reconstructed de novo a bifurcating trajectory that spanned from progenitor cells to mature striolar and extrastriolar hair cells. Extraction and alignment of developmental trails and association of pseudotime with bundle length measurements linked expression dynamics of individual genes with bundle growth stages. Differential trail analysis revealed high-resolution dynamics of transcripts that control striolar and extrastriolar bundle development, including those that encode proteins that regulate [Ca2+]i or mediate crosslinking and lengthening of actin filaments.
Collapse
Affiliation(s)
- Daniel C Ellwanger
- Department of Otolaryngology, Head & Neck Surgery and Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Mirko Scheibinger
- Department of Otolaryngology, Head & Neck Surgery and Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Rachel A Dumont
- Oregon Hearing Research Center and Vollum Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Peter G Barr-Gillespie
- Oregon Hearing Research Center and Vollum Institute, Oregon Health & Science University, Portland, OR 97239, USA.
| | - Stefan Heller
- Department of Otolaryngology, Head & Neck Surgery and Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
19
|
Boukouvalas A, Hensman J, Rattray M. BGP: identifying gene-specific branching dynamics from single-cell data with a branching Gaussian process. Genome Biol 2018; 19:65. [PMID: 29843817 PMCID: PMC5975664 DOI: 10.1186/s13059-018-1440-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 05/01/2018] [Indexed: 12/24/2022] Open
Abstract
High-throughput single-cell gene expression experiments can be used to uncover branching dynamics in cell populations undergoing differentiation through pseudotime methods. We develop the branching Gaussian process (BGP), a non-parametric model that is able to identify branching dynamics for individual genes and provide an estimate of branching times for each gene with an associated credible region. We demonstrate the effectiveness of our method on simulated data, a single-cell RNA-seq haematopoiesis study and mouse embryonic stem cells generated using droplet barcoding. The method is robust to high levels of technical variation and dropout, which are common in single-cell data.
Collapse
Affiliation(s)
- Alexis Boukouvalas
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester, UK
| | | | - Magnus Rattray
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester, UK
| |
Collapse
|
20
|
Chen J, Rénia L, Ginhoux F. Constructing cell lineages from single-cell transcriptomes. Mol Aspects Med 2017; 59:95-113. [PMID: 29107741 DOI: 10.1016/j.mam.2017.10.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 10/23/2017] [Accepted: 10/25/2017] [Indexed: 12/25/2022]
Abstract
Advances in single-cell RNA-sequencing have helped reveal the previously underappreciated level of cellular heterogeneity present during cellular differentiation. A static snapshot of single-cell transcriptomes provides a good representation of the various stages of differentiation as differentiation is rarely synchronized between cells. Data from numerous single-cell analyses has suggested that cellular differentiation and development can be conceptualized as continuous processes. Consequently, computational algorithms have been developed to infer lineage relationships between cell types and construct developmental trajectories along which cells are re-ordered such that similarity between successive cell pairs is maximized. Here, we compare and contrast the existing computational methods, and illustrate how they may be applied to build mouse myeloid progenitor lineages from massively parallel RNA single-cell sequencing data.
Collapse
Affiliation(s)
- Jinmiao Chen
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore.
| | - Laurent Rénia
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore
| | - Florent Ginhoux
- Singapore Immunology Network (SIgN), A*STAR, 8A Biomedical Grove, Immunos Building, Level 4, Singapore 138648, Singapore
| |
Collapse
|
21
|
Stubbington MJT, Rozenblatt-Rosen O, Regev A, Teichmann SA. Single-cell transcriptomics to explore the immune system in health and disease. Science 2017; 358:58-63. [PMID: 28983043 PMCID: PMC5654495 DOI: 10.1126/science.aan6828] [Citation(s) in RCA: 363] [Impact Index Per Article: 45.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The immune system varies in cell types, states, and locations. The complex networks, interactions, and responses of immune cells produce diverse cellular ecosystems composed of multiple cell types, accompanied by genetic diversity in antigen receptors. Within this ecosystem, innate and adaptive immune cells maintain and protect tissue function, integrity, and homeostasis upon changes in functional demands and diverse insults. Characterizing this inherent complexity requires studies at single-cell resolution. Recent advances such as massively parallel single-cell RNA sequencing and sophisticated computational methods are catalyzing a revolution in our understanding of immunology. Here we provide an overview of the state of single-cell genomics methods and an outlook on the use of single-cell techniques to decipher the adaptive and innate components of immunity.
Collapse
Affiliation(s)
| | - Orit Rozenblatt-Rosen
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
- Howard Hughes Medical Institute, Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Sarah A Teichmann
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
- Theory of Condensed Matter, Cavendish Laboratory, 19 JJ Thomson Ave, Cambridge CB3 0HE, UK
| |
Collapse
|
22
|
Sanchez-Castillo M, Blanco D, Tienda-Luna IM, Carrion MC, Huang Y. A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data. Bioinformatics 2017; 34:964-970. [DOI: 10.1093/bioinformatics/btx605] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2016] [Accepted: 09/22/2017] [Indexed: 11/12/2022] Open
Affiliation(s)
| | - D Blanco
- Department of Applied Physics, University of Granada, Granada, Spain
| | - I M Tienda-Luna
- Department of Electrical Engineering, University of Granada, Granada, Spain
| | - M C Carrion
- Department of Applied Physics, University of Granada, Granada, Spain
| | - Yufei Huang
- Department of Electrical Engineering, University of Texas at San Antonio, San Antonio, TX, USA
| |
Collapse
|
23
|
|
24
|
Blasi T, Buettner F, Strasser MK, Marr C, Theis FJ. cgCorrect: a method to correct for confounding cell-cell variation due to cell growth in single-cell transcriptomics. Phys Biol 2017; 14:036001. [PMID: 28198357 DOI: 10.1088/1478-3975/aa609a] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Accessing gene expression at a single-cell level has unraveled often large heterogeneity among seemingly homogeneous cells, which remains obscured when using traditional population-based approaches. The computational analysis of single-cell transcriptomics data, however, still imposes unresolved challenges with respect to normalization, visualization and modeling the data. One such issue is differences in cell size, which introduce additional variability into the data and for which appropriate normalization techniques are needed. Otherwise, these differences in cell size may obscure genuine heterogeneities among cell populations and lead to overdispersed steady-state distributions of mRNA transcript numbers. We present cgCorrect, a statistical framework to correct for differences in cell size that are due to cell growth in single-cell transcriptomics data. We derive the probability for the cell-growth-corrected mRNA transcript number given the measured, cell size-dependent mRNA transcript number, based on the assumption that the average number of transcripts in a cell increases proportionally to the cell's volume during the cell cycle. cgCorrect can be used for both data normalization and to analyze the steady-state distributions used to infer the gene expression mechanism. We demonstrate its applicability on both simulated data and single-cell quantitative real-time polymerase chain reaction (PCR) data from mouse blood stem and progenitor cells (and to quantitative single-cell RNA-sequencing data obtained from mouse embryonic stem cells). We show that correcting for differences in cell size affects the interpretation of the data obtained by typically performed computational analysis.
Collapse
Affiliation(s)
- Thomas Blasi
- Institute of Computational Biology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany. Department of Mathematics, Technische Universität München, Garching, Germany
| | | | | | | | | |
Collapse
|
25
|
Reid JE, Wernisch L. Pseudotime estimation: deconfounding single cell time series. Bioinformatics 2016; 32:2973-80. [PMID: 27318198 PMCID: PMC5039927 DOI: 10.1093/bioinformatics/btw372] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Revised: 05/20/2016] [Accepted: 06/07/2016] [Indexed: 11/21/2022] Open
Abstract
MOTIVATION Repeated cross-sectional time series single cell data confound several sources of variation, with contributions from measurement noise, stochastic cell-to-cell variation and cell progression at different rates. Time series from single cell assays are particularly susceptible to confounding as the measurements are not averaged over populations of cells. When several genes are assayed in parallel these effects can be estimated and corrected for under certain smoothness assumptions on cell progression. RESULTS We present a principled probabilistic model with a Bayesian inference scheme to analyse such data. We demonstrate our method's utility on public microarray, nCounter and RNA-seq datasets from three organisms. Our method almost perfectly recovers withheld capture times in an Arabidopsis dataset, it accurately estimates cell cycle peak times in a human prostate cancer cell line and it correctly identifies two precocious cells in a study of paracrine signalling in mouse dendritic cells. Furthermore, our method compares favourably with Monocle, a state-of-the-art technique. We also show using held-out data that uncertainty in the temporal dimension is a common confounder and should be accounted for in analyses of repeated cross-sectional time series. AVAILABILITY AND IMPLEMENTATION Our method is available on CRAN in the DeLorean package. CONTACT john.reid@mrc-bsu.cam.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John E Reid
- MRC Biostatistics Unit, Cambridge CB2 0SR, UK
| | | |
Collapse
|
26
|
Poirion OB, Zhu X, Ching T, Garmire L. Single-Cell Transcriptomics Bioinformatics and Computational Challenges. Front Genet 2016; 7:163. [PMID: 27708664 PMCID: PMC5030210 DOI: 10.3389/fgene.2016.00163] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2016] [Accepted: 09/02/2016] [Indexed: 12/21/2022] Open
Abstract
The emerging single-cell RNA-Seq (scRNA-Seq) technology holds the promise to revolutionize our understanding of diseases and associated biological processes at an unprecedented resolution. It opens the door to reveal intercellular heterogeneity and has been employed to a variety of applications, ranging from characterizing cancer cells subpopulations to elucidating tumor resistance mechanisms. Parallel to improving experimental protocols to deal with technological issues, deriving new analytical methods to interpret the complexity in scRNA-Seq data is just as challenging. Here, we review current state-of-the-art bioinformatics tools and methods for scRNA-Seq analysis, as well as addressing some critical analytical challenges that the field faces.
Collapse
Affiliation(s)
- Olivier B Poirion
- Epidemiology Program, University of Hawaii Cancer Center Honolulu, HI, USA
| | - Xun Zhu
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, USA; Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, USA
| | - Travers Ching
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, USA; Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, USA
| | - Lana Garmire
- Epidemiology Program, University of Hawaii Cancer Center Honolulu, HI, USA
| |
Collapse
|
27
|
Single-cell gene expression profiling and cell state dynamics: collecting data, correlating data points and connecting the dots. Curr Opin Biotechnol 2016; 39:207-214. [PMID: 27152696 DOI: 10.1016/j.copbio.2016.04.015] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Revised: 04/13/2016] [Accepted: 04/14/2016] [Indexed: 01/28/2023]
Abstract
Single-cell analyses of transcript and protein expression profiles-more precisely, single-cell resolution analysis of molecular profiles of cell populations-have now entered the center stage with widespread applications of single-cell qPCR, single-cell RNA-Seq and CyTOF. These high-dimensional population snapshot techniques are complemented by low-dimensional time-resolved, microscopy-based monitoring methods. Both fronts of advance have exposed a rich heterogeneity of cell states within uniform cell populations in many biological contexts, producing a new kind of data that has triggered computational analysis methods for data visualization, dimensionality reduction, and cluster (subpopulation) identification. The next step is now to go beyond collecting data and correlating data points: to connect the dots, that is, to understand what actually underlies the identified data patterns. This entails interpreting the 'clouds of points' in state space as a manifestation of the underlying molecular regulatory network. In that way control of cell state dynamics can be formalized as a quasi-potential landscape, as first proposed by Waddington. We summarize key methods of data acquisition and computational analysis and explain the principles that link the single-cell resolution measurements to dynamical systems theory.
Collapse
|
28
|
Saadatpour A, Lai S, Guo G, Yuan GC. Single-Cell Analysis in Cancer Genomics. Trends Genet 2016; 31:576-586. [PMID: 26450340 DOI: 10.1016/j.tig.2015.07.003] [Citation(s) in RCA: 119] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Revised: 06/26/2015] [Accepted: 07/20/2015] [Indexed: 02/04/2023]
Abstract
Genetic changes and environmental differences result in cellular heterogeneity among cancer cells within the same tumor, thereby complicating treatment outcomes. Recent advances in single-cell technologies have opened new avenues to characterize the intra-tumor cellular heterogeneity, identify rare cell types, measure mutation rates, and, ultimately, guide diagnosis and treatment. In this paper we review the recent single-cell technological and computational advances at the genomic, transcriptomic, and proteomic levels, and discuss their applications in cancer research.
Collapse
Affiliation(s)
- Assieh Saadatpour
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Shujing Lai
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Guoji Guo
- Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310058, China.
| | - Guo-Cheng Yuan
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Harvard Stem Cell Institute, Cambridge, MA 02138, USA.
| |
Collapse
|
29
|
Boiani M, Cibelli JB. What we can learn from single-cell analysis in development. Mol Hum Reprod 2016; 22:160-71. [DOI: 10.1093/molehr/gaw014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
30
|
Fan J, Salathia N, Liu R, Kaeser GE, Yung YC, Herman JL, Kaper F, Fan JB, Zhang K, Chun J, Kharchenko PV. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat Methods 2016; 13:241-4. [PMID: 26780092 PMCID: PMC4772672 DOI: 10.1038/nmeth.3734] [Citation(s) in RCA: 315] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 12/16/2015] [Indexed: 12/23/2022]
Abstract
The transcriptional state of a cell reflects a variety of biological factors, from persistent cell-type specific features to transient processes such as cell cycle. Depending on biological context, all such aspects of transcriptional heterogeneity may be of interest, but detecting them from noisy single-cell RNA-seq data remains challenging. We developed PAGODA to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability amongst measured cells.
Collapse
Affiliation(s)
- Jean Fan
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Rui Liu
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Gwendolyn E Kaeser
- Department of Molecular and Cellular Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, California, USA
| | - Yun C Yung
- Department of Molecular and Cellular Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, California, USA
| | - Joseph L Herman
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Jian-Bing Fan
- Illumina Inc., San Diego, California, USA.,Present address: AnchorDx Corporation, International Biotech Island, Guangzhou, Guangdong, China
| | - Kun Zhang
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Jerold Chun
- Department of Molecular and Cellular Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, California, USA
| | - Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
31
|
Angerer P, Haghverdi L, Büttner M, Theis FJ, Marr C, Buettner F. destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics 2015; 32:1241-3. [PMID: 26668002 DOI: 10.1093/bioinformatics/btv715] [Citation(s) in RCA: 411] [Impact Index Per Article: 41.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 12/01/2015] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED : Diffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single-cell expression data. Here we present destiny, an efficient R implementation of the diffusion map algorithm. Our package includes a single-cell specific noise model allowing for missing and censored values. In contrast to previous implementations, we further present an efficient nearest-neighbour approximation that allows for the processing of hundreds of thousands of cells and a functionality for projecting new data on existing diffusion maps. We exemplarily apply destiny to a recent time-resolved mass cytometry dataset of cellular reprogramming. AVAILABILITY AND IMPLEMENTATION destiny is an open-source R/Bioconductor package "bioconductor.org/packages/destiny" also available at www.helmholtz-muenchen.de/icb/destiny A detailed vignette describing functions and workflows is provided with the package. CONTACT carsten.marr@helmholtz-muenchen.de or f.buettner@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Philipp Angerer
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
| | - Laleh Haghverdi
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
| | - Maren Büttner
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
| | - Fabian J Theis
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Boltzmannstr. 3, 85748 Garching, Germany
| | - Carsten Marr
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
| | - Florian Buettner
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
| |
Collapse
|
32
|
Yalcin D, Hakguder ZM, Otu HH. Bioinformatics approaches to single-cell analysis in developmental biology. Mol Hum Reprod 2015; 22:182-92. [PMID: 26358759 DOI: 10.1093/molehr/gav050] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 09/04/2015] [Indexed: 12/17/2022] Open
Abstract
Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging and omics techniques on single cells. There have been improvements in computational single-cell image analysis in developmental biology regarding feature extraction, segmentation, image enhancement and machine learning, handling limitations of optical resolution to gain new perspectives from the raw microscopy images. Omics approaches, such as transcriptomics, genomics and epigenomics, targeting gene and small RNA expression, single nucleotide and structural variations and methylation and histone modifications, rely heavily on high-throughput sequencing technologies. Although there are well-established bioinformatics methods for analysis of sequence data, there are limited bioinformatics approaches which address experimental design, sample size considerations, amplification bias, normalization, differential expression, coverage, clustering and classification issues, specifically applied at the single-cell level. In this review, we summarize biological and technological advancements, discuss challenges faced in the aforementioned data acquisition and analysis issues and present future prospects for application of single-cell analyses to developmental biology.
Collapse
Affiliation(s)
- Dicle Yalcin
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0511, USA
| | - Zeynep M Hakguder
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0511, USA
| | - Hasan H Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0511, USA
| |
Collapse
|
33
|
Computational and experimental single cell biology techniques for the definition of cell type heterogeneity, interplay and intracellular dynamics. Curr Opin Biotechnol 2015; 34:9-15. [DOI: 10.1016/j.copbio.2014.10.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 10/21/2014] [Accepted: 10/22/2014] [Indexed: 12/31/2022]
|
34
|
Haghverdi L, Buettner F, Theis FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 2015; 31:2989-98. [PMID: 26002886 DOI: 10.1093/bioinformatics/btv325] [Citation(s) in RCA: 407] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 05/18/2015] [Indexed: 01/10/2023] Open
Abstract
MOTIVATION Single-cell technologies have recently gained popularity in cellular differentiation studies regarding their ability to resolve potential heterogeneities in cell populations. Analyzing such high-dimensional single-cell data has its own statistical and computational challenges. Popular multivariate approaches are based on data normalization, followed by dimension reduction and clustering to identify subgroups. However, in the case of cellular differentiation, we would not expect clear clusters to be present but instead expect the cells to follow continuous branching lineages. RESULTS Here, we propose the use of diffusion maps to deal with the problem of defining differentiation trajectories. We adapt this method to single-cell data by adequate choice of kernel width and inclusion of uncertainties or missing measurement values, which enables the establishment of a pseudotemporal ordering of single cells in a high-dimensional gene expression space. We expect this output to reflect cell differentiation trajectories, where the data originates from intrinsic diffusion-like dynamics. Starting from a pluripotent stage, cells move smoothly within the transcriptional landscape towards more differentiated states with some stochasticity along their path. We demonstrate the robustness of our method with respect to extrinsic noise (e.g. measurement noise) and sampling density heterogeneities on simulated toy data as well as two single-cell quantitative polymerase chain reaction datasets (i.e. mouse haematopoietic stem cells and mouse embryonic stem cells) and an RNA-Seq data of human pre-implantation embryos. We show that diffusion maps perform considerably better than Principal Component Analysis and are advantageous over other techniques for non-linear dimension reduction such as t-distributed Stochastic Neighbour Embedding for preserving the global structures and pseudotemporal ordering of cells. AVAILABILITY AND IMPLEMENTATION The Matlab implementation of diffusion maps for single-cell data is available at https://www.helmholtz-muenchen.de/icb/single-cell-diffusion-map. CONTACT fbuettner.phys@gmail.com, fabian.theis@helmholtz-muenchen.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laleh Haghverdi
- Institute of Computational Biology, Helmholtz Zentrum München 85764 Neuherberg, Germany and Department of Mathematics, Technische Universität München 85748 Garching, Germany Institute of Computational Biology, Helmholtz Zentrum München 85764 Neuherberg, Germany and Department of Mathematics, Technische Universität München 85748 Garching, Germany
| | - Florian Buettner
- Institute of Computational Biology, Helmholtz Zentrum München 85764 Neuherberg, Germany and Department of Mathematics, Technische Universität München 85748 Garching, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München 85764 Neuherberg, Germany and Department of Mathematics, Technische Universität München 85748 Garching, Germany Institute of Computational Biology, Helmholtz Zentrum München 85764 Neuherberg, Germany and Department of Mathematics, Technische Universität München 85748 Garching, Germany
| |
Collapse
|
35
|
Taher L, Pfeiffer MJ, Fuellen G. Bioinformatics approaches to single-blastomere transcriptomics. Mol Hum Reprod 2015; 21:115-125. [PMID: 25239944 DOI: 10.1093/molehr/gau083] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025] Open
Abstract
The totipotent zygote gives rise to cells with differing identities during mouse preimplantation development. Many studies have focused on analyzing the spatio-temporal dependencies during these lineage decision processes and much has been learnt by tracing transgenic marker gene expression up to the blastocyst stage and by analyzing the effects of genetic manipulations (knockout/ overexpression) on embryo development. However, until recently, it has not been possible to get broader overviews on the gene expression networks that distinguish one cell from the other within the same embryo. With the advent of whole genome amplification methodology and microfluidics-based quantitative RT-PCR it became possible to generate transcriptomes of single cells. Here we review the current state of the art of single-cell transcriptomics applied to mouse preimplantation embryo blastomeres and summarize findings made by pioneering studies in recent years. Furthermore we use the PluriNetWork and ExprEssence to investigate cell transitions based on published data.
Collapse
Affiliation(s)
- Leila Taher
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany
| | - Martin J Pfeiffer
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany Max Planck Institute for Molecular Biomedicine, Münster, Germany
| | - Georg Fuellen
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Rostock, Germany
| |
Collapse
|
36
|
Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol 2015; 33:155-60. [DOI: 10.1038/nbt.3102] [Citation(s) in RCA: 854] [Impact Index Per Article: 85.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2014] [Accepted: 11/05/2014] [Indexed: 12/26/2022]
|
37
|
Feigelman J, Theis FJ, Marr C. MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data. BMC Bioinformatics 2014; 15:240. [PMID: 25015590 PMCID: PMC4227291 DOI: 10.1186/1471-2105-15-240] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 07/04/2014] [Indexed: 01/09/2023] Open
Abstract
Background Biological data often originate from samples containing mixtures of subpopulations, corresponding e.g. to distinct cellular phenotypes. However, identification of distinct subpopulations may be difficult if biological measurements yield distributions that are not easily separable. Results We present Multiresolution Correlation Analysis (MCA), a method for visually identifying subpopulations based on the local pairwise correlation between covariates, without needing to define an a priori interaction scale. We demonstrate that MCA facilitates the identification of differentially regulated subpopulations in simulated data from a small gene regulatory network, followed by application to previously published single-cell qPCR data from mouse embryonic stem cells. We show that MCA recovers previously identified subpopulations, provides additional insight into the underlying correlation structure, reveals potentially spurious compartmentalizations, and provides insight into novel subpopulations. Conclusions MCA is a useful method for the identification of subpopulations in low-dimensional expression data, as emerging from qPCR or FACS measurements. With MCA it is possible to investigate the robustness of covariate correlations with respect subpopulations, graphically identify outliers, and identify factors contributing to differential regulation between pairs of covariates. MCA thus provides a framework for investigation of expression correlations for genes of interests and biological hypothesis generation.
Collapse
Affiliation(s)
| | | | - Carsten Marr
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany.
| |
Collapse
|
38
|
Buettner F, Moignard V, Göttgens B, Theis FJ. Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data. ACTA ACUST UNITED AC 2014; 30:1867-75. [PMID: 24618470 PMCID: PMC4071202 DOI: 10.1093/bioinformatics/btu134] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Motivation: High-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to low or absent expression, and the true expression level is unknown. Because this censoring can occur for high proportions of the data, it is one of the main challenges when dealing with single-cell qPCR data. Principal component analysis (PCA) is an important tool for visualizing the structure of high-dimensional data as well as for identifying subpopulations of cells. However, to date it is not clear how to perform a PCA of censored data. We present a probabilistic approach that accounts for the censoring and evaluate it for two typical datasets containing single-cell qPCR data. Results: We use the Gaussian process latent variable model framework to account for censoring by introducing an appropriate noise model and allowing a different kernel for each dimension. We evaluate this new approach for two typical qPCR datasets (of mouse embryonic stem cells and blood stem/progenitor cells, respectively) by performing linear and non-linear probabilistic PCA. Taking the censoring into account results in a 2D representation of the data, which better reflects its known structure: in both datasets, our new approach results in a better separation of known cell types and is able to reveal subpopulations in one dataset that could not be resolved using standard PCA. Availability and implementation: The implementation was based on the existing Gaussian process latent variable model toolbox (https://github.com/SheffieldML/GPmat); extensions for noise models and kernels accounting for censoring are available at http://icb.helmholtz-muenchen.de/censgplvm. Contact:fbuettner.phys@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Florian Buettner
- Institute of Computational Biology, Helmholtz-Zentrum München, 85764 Neuherberg, Germany, Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research and Wellcome Trust & MRC Cambridge Stem Cell Institute, Cambridge CB2 0XY, UK and Department of Mathematics, TU München, 85748 Garching, Germany
| | - Victoria Moignard
- Institute of Computational Biology, Helmholtz-Zentrum München, 85764 Neuherberg, Germany, Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research and Wellcome Trust & MRC Cambridge Stem Cell Institute, Cambridge CB2 0XY, UK and Department of Mathematics, TU München, 85748 Garching, Germany
| | - Berthold Göttgens
- Institute of Computational Biology, Helmholtz-Zentrum München, 85764 Neuherberg, Germany, Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research and Wellcome Trust & MRC Cambridge Stem Cell Institute, Cambridge CB2 0XY, UK and Department of Mathematics, TU München, 85748 Garching, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz-Zentrum München, 85764 Neuherberg, Germany, Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research and Wellcome Trust & MRC Cambridge Stem Cell Institute, Cambridge CB2 0XY, UK and Department of Mathematics, TU München, 85748 Garching, GermanyInstitute of Computational Biology, Helmholtz-Zentrum München, 85764 Neuherberg, Germany, Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research and Wellcome Trust & MRC Cambridge Stem Cell Institute, Cambridge CB2 0XY, UK and Department of Mathematics, TU München, 85748 Garching, Germany
| |
Collapse
|
39
|
Moignard V, Göttgens B. Transcriptional mechanisms of cell fate decisions revealed by single cell expression profiling. Bioessays 2014; 36:419-26. [PMID: 24470343 PMCID: PMC3992849 DOI: 10.1002/bies.201300102] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Transcriptional networks regulate cell fate decisions, which occur at the level of individual cells. However, much of what we know about their structure and function comes from studies averaging measurements over large populations of cells, many of which are functionally heterogeneous. Such studies conceal the variability between cells and so prevent us from determining the nature of heterogeneity at the molecular level. In recent years, many protocols and platforms have been developed that allow the high throughput analysis of gene expression in single cells, opening the door to a new era of biology. Here, we discuss the need for single cell gene expression analysis to gain deeper insights into the transcriptional control of cell fate decisions, and consider the insights it has provided so far into transcriptional regulatory networks in development.
Collapse
Affiliation(s)
- Victoria Moignard
- Department of Haematology, University of Cambridge, Cambridge, UK; Wellcome Trust - Medical Research Council, Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK; Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
40
|
Moignard V, Macaulay IC, Swiers G, Buettner F, Schütte J, Calero-Nieto FJ, Kinston S, Joshi A, Hannah R, Theis FJ, Jacobsen SE, de Bruijn M, Göttgens B. Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nat Cell Biol 2013; 15:363-72. [PMID: 23524953 PMCID: PMC3796878 DOI: 10.1038/ncb2709] [Citation(s) in RCA: 205] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2012] [Accepted: 02/08/2013] [Indexed: 12/15/2022]
Abstract
Cellular decision-making is mediated by a complex interplay of external stimuli with the intracellular environment, in particular transcription factor regulatory networks. Here we have determined the expression of a network of 18 key haematopoietic transcription factors in 597 single primary blood stem and progenitor cells isolated from mouse bone marrow. We demonstrate that different stem/progenitor populations are characterized by distinctive transcription factor expression states, and through comprehensive bioinformatic analysis reveal positively and negatively correlated transcription factor pairings, including previously unrecognized relationships between Gata2, Gfi1 and Gfi1b. Validation using transcriptional and transgenic assays confirmed direct regulatory interactions consistent with a regulatory triad in immature blood stem cells, where Gata2 may function to modulate cross-inhibition between Gfi1 and Gfi1b. Single-cell expression profiling therefore identifies network states and allows reconstruction of network hierarchies involved in controlling stem cell fate choices, and provides a blueprint for studying both normal development and human disease.
Collapse
Affiliation(s)
- Victoria Moignard
- University of Cambridge, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical, Cambridge, CB2 0XY, United Kingdom
| | - Iain C. Macaulay
- Haematopoietic Stem Cell Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, OX3 9DS, United Kingdom
| | - Gemma Swiers
- MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, OX3 9DS, United Kingdom
| | - Florian Buettner
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Ingolstadter Landstraße 1, 85764 Neuherberg, Germany
| | - Judith Schütte
- University of Cambridge, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical, Cambridge, CB2 0XY, United Kingdom
| | - Fernando J. Calero-Nieto
- University of Cambridge, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical, Cambridge, CB2 0XY, United Kingdom
| | - Sarah Kinston
- University of Cambridge, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical, Cambridge, CB2 0XY, United Kingdom
| | - Anagha Joshi
- University of Cambridge, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical, Cambridge, CB2 0XY, United Kingdom
| | - Rebecca Hannah
- University of Cambridge, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical, Cambridge, CB2 0XY, United Kingdom
| | - Fabian J. Theis
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Ingolstadter Landstraße 1, 85764 Neuherberg, Germany
| | - Sten Eirik Jacobsen
- Haematopoietic Stem Cell Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, OX3 9DS, United Kingdom
| | - Marella de Bruijn
- MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, OX3 9DS, United Kingdom
| | - Berthold Göttgens
- University of Cambridge, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical, Cambridge, CB2 0XY, United Kingdom
| |
Collapse
|