1
|
Rana V, Peng J, Pan C, Lyu H, Cheng A, Kim M, Milenkovic O. Interpretable online network dictionary learning for inferring long-range chromatin interactions. PLoS Comput Biol 2024; 20:e1012095. [PMID: 38753877 DOI: 10.1371/journal.pcbi.1012095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 05/29/2024] [Accepted: 04/20/2024] [Indexed: 05/18/2024] Open
Abstract
Dictionary learning (DL), implemented via matrix factorization (MF), is commonly used in computational biology to tackle ubiquitous clustering problems. The method is favored due to its conceptual simplicity and relatively low computational complexity. However, DL algorithms produce results that lack interpretability in terms of real biological data. Additionally, they are not optimized for graph-structured data and hence often fail to handle them in a scalable manner. In order to address these limitations, we propose a novel DL algorithm called online convex network dictionary learning (online cvxNDL). Unlike classical DL algorithms, online cvxNDL is implemented via MF and designed to handle extremely large datasets by virtue of its online nature. Importantly, it enables the interpretation of dictionary elements, which serve as cluster representatives, through convex combinations of real measurements. Moreover, the algorithm can be applied to data with a network structure by incorporating specialized subnetwork sampling techniques. To demonstrate the utility of our approach, we apply cvxNDL on 3D-genome RNAPII ChIA-Drop data with the goal of identifying important long-range interaction patterns (long-range dictionary elements). ChIA-Drop probes higher-order interactions, and produces data in the form of hypergraphs whose nodes represent genomic fragments. The hyperedges represent observed physical contacts. Our hypergraph model analysis has the objective of creating an interpretable dictionary of long-range interaction patterns that accurately represent global chromatin physical contact maps. Through the use of dictionary information, one can also associate the contact maps with RNA transcripts and infer cellular functions. To accomplish the task at hand, we focus on RNAPII-enriched ChIA-Drop data from Drosophila Melanogaster S2 cell lines. Our results offer two key insights. First, we demonstrate that online cvxNDL retains the accuracy of classical DL (MF) methods while simultaneously ensuring unique interpretability and scalability. Second, we identify distinct collections of proximal and distal interaction patterns involving chromatin elements shared by related processes across different chromosomes, as well as patterns unique to specific chromosomes. To associate the dictionary elements with biological properties of the corresponding chromatin regions, we employ Gene Ontology (GO) enrichment analysis and perform multiple RNA coexpression studies.
Collapse
Affiliation(s)
- Vishal Rana
- Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Jianhao Peng
- Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Chao Pan
- Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Hanbaek Lyu
- Department of Mathematics, University of Wisconsin - Madison, Madison, Wisconsin, United States of America
| | - Albert Cheng
- School of Biological and Health Systems Engineering, Arizona State University, Phoenix, Arizona, United States of America
| | - Minji Kim
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Olgica Milenkovic
- Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Illinois, United States of America
| |
Collapse
|
2
|
Feng H, Cottrell S, Hozumi Y, Wei GW. Multiscale differential geometry learning of networks with applications to single-cell RNA sequencing data. Comput Biol Med 2024; 171:108211. [PMID: 38422960 PMCID: PMC10965033 DOI: 10.1016/j.compbiomed.2024.108211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/02/2024] [Accepted: 02/25/2024] [Indexed: 03/02/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology, offering unparalleled insights into the intricate landscape of cellular diversity and gene expression dynamics. scRNA-seq analysis represents a challenging and cutting-edge frontier within the field of biological research. Differential geometry serves as a powerful mathematical tool in various applications of scientific research. In this study, we introduce, for the first time, a multiscale differential geometry (MDG) strategy for addressing the challenges encountered in scRNA-seq data analysis. We assume that intrinsic properties of cells lie on a family of low-dimensional manifolds embedded in the high-dimensional space of scRNA-seq data. Multiscale cell-cell interactive manifolds are constructed to reveal complex relationships in the cell-cell network, where curvature-based features for cells can decipher the intricate structural and biological information. We showcase the utility of our novel approach by demonstrating its effectiveness in classifying cell types. This innovative application of differential geometry in scRNA-seq analysis opens new avenues for understanding the intricacies of biological networks and holds great potential for network analysis in other fields.
Collapse
Affiliation(s)
- Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Sean Cottrell
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
3
|
Johnson JAI, Tsang AP, Mitchell JT, Zhou DL, Bowden J, Davis-Marcisak E, Sherman T, Liefeld T, Loth M, Goff LA, Zimmerman JW, Kinny-Köster B, Jaffee EM, Tamayo P, Mesirov JP, Reich M, Fertig EJ, Stein-O'Brien GL. Inferring cellular and molecular processes in single-cell data with non-negative matrix factorization using Python, R and GenePattern Notebook implementations of CoGAPS. Nat Protoc 2023; 18:3690-3731. [PMID: 37989764 PMCID: PMC10961825 DOI: 10.1038/s41596-023-00892-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 07/21/2023] [Indexed: 11/23/2023]
Abstract
Non-negative matrix factorization (NMF) is an unsupervised learning method well suited to high-throughput biology. However, inferring biological processes from an NMF result still requires additional post hoc statistics and annotation for interpretation of learned features. Here, we introduce a suite of computational tools that implement NMF and provide methods for accurate and clear biological interpretation and analysis. A generalized discussion of NMF covering its benefits, limitations and open questions is followed by four procedures for the Bayesian NMF algorithm Coordinated Gene Activity across Pattern Subsets (CoGAPS). Each procedure will demonstrate NMF analysis to quantify cell state transitions in a public domain single-cell RNA-sequencing dataset. The first demonstrates PyCoGAPS, our new Python implementation that enhances runtime for large datasets, and the second allows its deployment in Docker. The third procedure steps through the same single-cell NMF analysis using our R CoGAPS interface. The fourth introduces a beginner-friendly CoGAPS platform using GenePattern Notebook, aimed at users with a working conceptual knowledge of data analysis but without a basic proficiency in the R or Python programming language. We also constructed a user-facing website to serve as a central repository for information and instructional materials about CoGAPS and its application programming interfaces. The expected timing to setup the packages and conduct a test run is around 15 min, and an additional 30 min to conduct analyses on a precomputed result. The expected runtime on the user's desired dataset can vary from hours to days depending on factors such as dataset size or input parameters.
Collapse
Affiliation(s)
- Jeanette A I Johnson
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Ashley P Tsang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jacob T Mitchell
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - David L Zhou
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
| | - Julia Bowden
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Emily Davis-Marcisak
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Thomas Sherman
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Ted Liefeld
- Department of Medicine, Moores Cancer Center, University of California San Diego, San Diego, CA, USA
| | - Melanie Loth
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Loyal A Goff
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
- Kavli Neurodiscovery Institute, Johns Hopkins University, Baltimore, MD, USA
- Single Cell Training and Analysis Center, Johns Hopkins University, Baltimore, MD, USA
| | - Jacquelyn W Zimmerman
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Ben Kinny-Köster
- Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Elizabeth M Jaffee
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Pablo Tamayo
- Department of Medicine, Moores Cancer Center, University of California San Diego, San Diego, CA, USA
| | - Jill P Mesirov
- Department of Medicine, Moores Cancer Center, University of California San Diego, San Diego, CA, USA
| | - Michael Reich
- Department of Medicine, Moores Cancer Center, University of California San Diego, San Diego, CA, USA
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Single Cell Training and Analysis Center, Johns Hopkins University, Baltimore, MD, USA.
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
| | - Genevieve L Stein-O'Brien
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
- Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA.
- Kavli Neurodiscovery Institute, Johns Hopkins University, Baltimore, MD, USA.
- Single Cell Training and Analysis Center, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
4
|
Zhou Y, Luo K, Liang L, Chen M, He X. A new Bayesian factor analysis method improves detection of genes and biological processes affected by perturbations in single-cell CRISPR screening. Nat Methods 2023; 20:1693-1703. [PMID: 37770710 PMCID: PMC10630124 DOI: 10.1038/s41592-023-02017-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 08/18/2023] [Indexed: 09/30/2023]
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR) screening coupled with single-cell RNA sequencing has emerged as a powerful tool to characterize the effects of genetic perturbations on the whole transcriptome at a single-cell level. However, due to its sparsity and complex structure, analysis of single-cell CRISPR screening data is challenging. In particular, standard differential expression analysis methods are often underpowered to detect genes affected by CRISPR perturbations. We developed a statistical method for such data, called guided sparse factor analysis (GSFA). GSFA infers latent factors that represent coregulated genes or gene modules; by borrowing information from these factors, it infers the effects of genetic perturbations on individual genes. We demonstrated through extensive simulation studies that GSFA detects perturbation effects with much higher power than state-of-the-art methods. Using single-cell CRISPR data from human CD8+ T cells and neural progenitor cells, we showed that GSFA identified biologically relevant gene modules and specific genes affected by CRISPR perturbations, many of which were missed by existing methods, providing new insights into the functions of genes involved in T cell activation and neurodevelopment.
Collapse
Affiliation(s)
- Yifan Zhou
- Graduate Program of Biophysical Sciences, University of Chicago, Chicago, IL, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Lifan Liang
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Mengjie Chen
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Medicine, University of Chicago, Chicago, IL, USA.
| | - Xin He
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
5
|
Kumar N, Skubleny D, Parkes M, Verma R, Davis S, Kumar L, Aissiou A, Greiner R. Learning Individual Survival Models from PanCancer Whole Transcriptome Data. Clin Cancer Res 2023; 29:3924-3936. [PMID: 37463063 PMCID: PMC10543961 DOI: 10.1158/1078-0432.ccr-22-3493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 02/11/2023] [Accepted: 07/11/2023] [Indexed: 07/20/2023]
Abstract
PURPOSE Personalized medicine attempts to predict survival time for each patient, based on their individual tumor molecular profile. We investigate whether our survival learner in combination with a dimension reduction method can produce useful survival estimates for a variety of patients with cancer. EXPERIMENTAL DESIGN This article provides a method that learns a model for predicting the survival time for individual patients with cancer from the PanCancer Atlas: given the (16,335 dimensional) gene expression profiles from 10,173 patients, each having one of 33 cancers, this method uses unsupervised nonnegative matrix factorization (NMF) to reexpress the gene expression data for each patient in terms of 100 learned NMF factors. It then feeds these 100 factors into the Multi-Task Logistic Regression (MTLR) learner to produce cancer-specific models for each of 20 cancers (with >50 uncensored instances); this produces "individual survival distributions" (ISD), which provide survival probabilities at each future time for each individual patient, which provides a patient's risk score and estimated survival time. RESULTS Our NMF-MTLR concordance indices outperformed the VAECox benchmark by 14.9% overall. We achieved optimal survival prediction using pan-cancer NMF in combination with cancer-specific MTLR models. We provide biological interpretation of the NMF model and clinical implications of ISDs for prognosis and therapeutic response prediction. CONCLUSIONS NMF-MTLR provides many benefits over other models: superior model discrimination, superior calibration, meaningful survival time estimates, and accurate probabilistic estimates of survival over time for each individual patient. We advocate for the adoption of these cancer survival models in clinical and research settings.
Collapse
Affiliation(s)
- Neeraj Kumar
- Alberta Machine Intelligence Institute, Edmonton, Alberta, Canada
| | - Daniel Skubleny
- Department of Surgery, University of Alberta, Edmonton, Alberta, Canada
| | - Michael Parkes
- Computing Science Department, University of Alberta, Edmonton, Alberta, Canada
| | - Ruchika Verma
- Alberta Machine Intelligence Institute, Edmonton, Alberta, Canada
| | - Sacha Davis
- Alberta Machine Intelligence Institute, Edmonton, Alberta, Canada
| | - Luke Kumar
- Microsoft, Vancouver, British Columbia, Canada
| | | | - Russell Greiner
- Alberta Machine Intelligence Institute, Edmonton, Alberta, Canada
- Computing Science Department, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
6
|
Zhang H, Lu X, Lu B, Chen L. scGEM: Unveiling the Nested Tree-Structured Gene Co-Expressing Modules in Single Cell Transcriptome Data. Cancers (Basel) 2023; 15:4277. [PMID: 37686554 PMCID: PMC10486867 DOI: 10.3390/cancers15174277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/22/2023] [Accepted: 08/25/2023] [Indexed: 09/10/2023] Open
Abstract
BACKGROUND Single-cell transcriptome analysis has fundamentally changed biological research by allowing higher-resolution computational analysis of individual cells and subsets of cell types. However, few methods have met the need to recognize and quantify the underlying cellular programs that determine the specialization and differentiation of the cell types. METHODS In this study, we present scGEM, a nested tree-structured nonparametric Bayesian model, to reveal the gene co-expression modules (GEMs) reflecting transcriptome processes in single cells. RESULTS We show that scGEM can discover shared and specialized transcriptome signals across different cell types using peripheral blood mononuclear single cells and early brain development single cells. scGEM outperformed other methods in perplexity and topic coherence (p < 0.001) on our simulation data. Larger datasets, deeper trees and pre-trained models are shown to be positively associated with better scGEM performance. The GEMs obtained from triple-negative breast cancer single cells exhibited better correlations with lymphocyte infiltration (p = 0.009) and the cell cycle (p < 0.001) than other methods in additional validation on the bulk RNAseq dataset. CONCLUSIONS Altogether, we demonstrate that scGEM can be used to model the hidden cellular functions of single cells, thereby unveiling the specialization and generalization of transcriptomic programs across different types of cells.
Collapse
Affiliation(s)
- Han Zhang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA; (H.Z.)
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA; (H.Z.)
- UPMC Hillman Cancer Center, Pittsburgh, PA 15232, USA
| | - Binfeng Lu
- Center for Discovery and Innovation, Hackensack Meridian Health, Nutley, NJ 07110, USA
| | - Lujia Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA; (H.Z.)
| |
Collapse
|
7
|
Ozturk K, Panwala R, Sheen J, Ford K, Payne N, Zhang DE, Hutter S, Haferlach T, Ideker T, Mali P, Carter H. Interface-guided phenotyping of coding variants in the transcription factor RUNX1 with SEUSS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.03.551876. [PMID: 37577681 PMCID: PMC10418284 DOI: 10.1101/2023.08.03.551876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. Here we deploy SEUSS, a Perturb-seq like approach, to generate and assay mutations at physical interfaces of the RUNX1 Runt domain. We measured the impact of 115 mutations on RNA profiles in single myelogenous leukemia cells and used the profiles to categorize mutations into three functionally distinct groups: wild-type (WT)-like, loss-of-function (LOF)-like and hypomorphic. Notably, the largest concentration of functional mutations (non-WT-like) clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Hypomorphic variants shared characteristics with loss of function variants but had gene expression profiles indicative of response to neural growth factor and cytokine recruitment of neutrophils. Additionally, DNA accessibility changes upon perturbations were enriched for RUNX1 binding motifs, particularly near differentially expressed genes. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.
Collapse
|
8
|
ASGARD is A Single-cell Guided Pipeline to Aid Repurposing of Drugs. Nat Commun 2023; 14:993. [PMID: 36813801 PMCID: PMC9945835 DOI: 10.1038/s41467-023-36637-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 02/10/2023] [Indexed: 02/24/2023] Open
Abstract
Single-cell RNA sequencing technology has enabled in-depth analysis of intercellular heterogeneity in various diseases. However, its full potential for precision medicine has yet to be reached. Towards this, we propose A Single-cell Guided Pipeline to Aid Repurposing of Drugs (ASGARD) that defines a drug score to recommend drugs by considering all cell clusters to address the intercellular heterogeneity within each patient. ASGARD shows significantly better average accuracy on single-drug therapy compared to two bulk-cell-based drug repurposing methods. We also demonstrated that it performs considerably better than other cell cluster-level predicting methods. In addition, we validate ASGARD using the drug response prediction method TRANSACT with Triple-Negative-Breast-Cancer patient samples. We find that many top-ranked drugs are either approved by the Food and Drug Administration or in clinical trials treating corresponding diseases. In conclusion, ASGARD is a promising drug repurposing recommendation tool guided by single-cell RNA-seq for personalized medicine. ASGARD is free for educational use at https://github.com/lanagarmire/ASGARD .
Collapse
|
9
|
Pandey D, Onkara PP. Improved downstream functional analysis of single-cell RNA-sequence data using DGAN. Sci Rep 2023; 13:1618. [PMID: 36709340 PMCID: PMC9884242 DOI: 10.1038/s41598-023-28952-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/27/2023] [Indexed: 01/29/2023] Open
Abstract
The dramatic increase in the number of single-cell RNA-sequence (scRNA-seq) investigations is indeed an endorsement of the new-fangled proficiencies of next generation sequencing technologies that facilitate the accurate measurement of tens of thousands of RNA expression levels at the cellular resolution. Nevertheless, missing values of RNA amplification persist and remain as a significant computational challenge, as these data omission induce further noise in their respective cellular data and ultimately impede downstream functional analysis of scRNA-seq data. Consequently, it turns imperative to develop robust and efficient scRNA-seq data imputation methods for improved downstream functional analysis outcomes. To overcome this adversity, we have designed an imputation framework namely deep generative autoencoder network [DGAN]. In essence, DGAN is an evolved variational autoencoder designed to robustly impute data dropouts in scRNA-seq data manifested as a sparse gene expression matrix. DGAN principally reckons count distribution, besides data sparsity utilizing a gaussian model whereby, cell dependencies are capitalized to detect and exclude outlier cells via imputation. When tested on five publicly available scRNA-seq data, DGAN outperformed every single baseline method paralleled, with respect to downstream functional analysis including cell data visualization, clustering, classification and differential expression analysis. DGAN is executed in Python and is accessible at https://github.com/dikshap11/DGAN .
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, India
| | - Perumal P Onkara
- Department of Biotechnology, National Institute of Technology, Warangal, India.
| |
Collapse
|
10
|
Wang H, Ma X. Learning discriminative and structural samples for rare cell types with deep generative model. Brief Bioinform 2022; 23:6652812. [PMID: 35914950 DOI: 10.1093/bib/bbac317] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/11/2022] [Accepted: 07/13/2022] [Indexed: 02/02/2023] Open
Abstract
Cell types (subpopulations) serve as bio-markers for the diagnosis and therapy of complex diseases, and single-cell RNA-sequencing (scRNA-seq) measures expression of genes at cell level, paving the way for the identification of cell types. Although great efforts have been devoted to this issue, it remains challenging to identify rare cell types in scRNA-seq data because of the few-shot problem, lack of interpretability and separation of generating samples and clustering of cells. To attack these issues, a novel deep generative model for leveraging the small samples of cells (aka scLDS2) is proposed by precisely estimating the distribution of different cells, which discriminate the rare and non-rare cell types with adversarial learning. Specifically, to enhance interpretability of samples, scLDS2 generates the sparse faked samples of cells with $\ell _1$-norm, where the relations among cells are learned, facilitating the identification of cell types. Furthermore, scLDS2 directly obtains cell types from the generated samples by learning the block structure such that cells belonging to the same types are similar to each other with the nuclear-norm. scLDS2 joins the generation of samples, classification of the generated and truth samples for cells and feature extraction into a unified generative framework, which transforms the rare cell types detection problem into a classification problem, paving the way for the identification of cell types with joint learning. The experimental results on 20 datasets demonstrate that scLDS2 significantly outperforms 17 state-of-the-art methods in terms of various measurements with 25.12% improvement in adjusted rand index on average, providing an effective strategy for scRNA-seq data with rare cell types. (The software is coded using python, and is freely available for academic https://github.com/xkmaxidian/scLDS2).
Collapse
Affiliation(s)
- Haiyue Wang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| |
Collapse
|
11
|
Mao W, Pouyan MB, Kostka D, Chikina M. Non-negative Independent Factor Analysis disentangles discrete and continuous sources of variation in scRNA-seq data. Bioinformatics 2022; 38:2749-2756. [PMID: 35561207 PMCID: PMC9113312 DOI: 10.1093/bioinformatics/btac136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 02/25/2022] [Accepted: 03/17/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Single-cell RNA-seq analysis has emerged as a powerful tool for understanding inter-cellular heterogeneity. Due to the inherent noise of the data, computational techniques often rely on dimensionality reduction (DR) as both a pre-processing step and an analysis tool. Ideally, DR should preserve the biological information while discarding the noise. However, if the DR is to be used directly to gain biological insight it must also be interpretable-that is the individual dimensions of the reduction should correspond to specific biological variables such as cell-type identity or pathway activity. Maximizing biological interpretability necessitates making assumption about the data structures and the choice of the model is critical. RESULTS We present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that incorporates different interpretability inducing assumptions into a single modeling framework. The key advantage of our NIFA model is that it simultaneously models uni- and multi-modal latent factors, and thus isolates discrete cell-type identity and continuous pathway activity into separate components. We apply our approach to a range of datasets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA, NMF and scCoGAPS (an NMF method designed for single-cell data) in terms of disentangling biological sources of variation. Studying an immunotherapy dataset in detail, we show that NIFA is able to reproduce and refine previous findings in a single analysis framework and enables the discovery of new clinically relevant cell states. AVAILABILITY AND IMPLEMENTATION NFIA is a R package which is freely available at GitHub (https://github.com/wgmao/NIFA). The test dataset is archived at https://zenodo.org/record/6286646. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weiguang Mao
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA
- Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA
| | | | - Dennis Kostka
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA
- Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA
- Department of Developmental Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA
- Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | | |
Collapse
|
12
|
Zeira R, Land M, Strzalkowski A, Raphael BJ. Alignment and integration of spatial transcriptomics data. Nat Methods 2022; 19:567-575. [PMID: 35577957 PMCID: PMC9334025 DOI: 10.1038/s41592-022-01459-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 03/17/2022] [Indexed: 01/05/2023]
Abstract
Spatial transcriptomics (ST) measures mRNA expression across thousands of spots from a tissue slice while recording the two-dimensional (2D) coordinates of each spot. We introduce probabilistic alignment of ST experiments (PASTE), a method to align and integrate ST data from multiple adjacent tissue slices. PASTE computes pairwise alignments of slices using an optimal transport formulation that models both transcriptional similarity and physical distances between spots. PASTE further combines pairwise alignments to construct a stacked 3D alignment of a tissue. Alternatively, PASTE can integrate multiple ST slices into a single consensus slice. We show that PASTE accurately aligns spots across adjacent slices in both simulated and real ST data, demonstrating the advantages of using both transcriptional similarity and spatial information. We further show that the PASTE integrated slice improves the identification of cell types and differentially expressed genes compared with existing approaches that either analyze single ST slices or ignore spatial information.
Collapse
Affiliation(s)
- Ron Zeira
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| | - Max Land
- Department of Computer Science, Princeton University, Princeton, NJ 08540
| | | | - Benjamin J. Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08540,Correspondence:
| |
Collapse
|
13
|
Gan S, Deng H, Qiu Y, Alshahrani M, Liu S. DSAE-Impute: Learning Discriminative Stacked Autoencoders for Imputing Single-cell RNA-seq Data. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220330151024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aims:
In this research, we aim to propose an accurate deep learning method to impute the missing values in scRNA-seq data. DSAE-Impute employs stacked autoencoders to capture gene expression characteristics in the original missing data and combines the discriminative correlation matrix between cells to capture global expression features during the training process, so as to accurately predict missing values.
Background:
Due to the limited amount of mRNA in single-cell, there are always many missing values in scRNA-seq data, which makes it impossible to accurately quantify the expression of single-cell RNA. The dropout phenomenon makes it impossible to detect the truly expressed genes in some cells, which greatly affects the downstream analysis on scRNA-seq data, such as cell cluster analysis and cell development trajectories.
Objective:
In this research, we aim to propose an accurate deep learning method to impute the missing values in scRNA-seq data. DSAE-Impute employs stacked autoencoders to capture gene expression characteristics in the original missing data and combines the discriminative correlation matrix between cells to capture global expression features during the training process, so as to accurately predict missing values.
Method:
We propose a novel deep learning model based on the discriminative stacked autoencoders to impute the missing values in scRNA-seq data, named DSAE-Impute. DSAE-Impute embeds the discriminative cell similarity to perfect the feature representation of stacked autoencoders, and comprehensively learns the scRNA-seq data expression pattern through layer-by-layer training to achieve accurate imputation.
Result:
We have systematically evaluated the performance of DSAE-Impute in the simulation and real datasets. The experimental results demonstrate that DSAE-Impute significantly improves downstream analysis, and its imputation results are more accurate compared with other state-of-the-art imputation methods.
Conclusion:
Extensive experiments show that compared with other state-of-the-art methods, the imputation results of DSAE-Impute on simulated and real datasets are more accurate and helpful for downstream analysis.
Collapse
Affiliation(s)
- Shengfeng Gan
- College of Computer, Hubei University of Education, Wuhan, China
| | - Huan Deng
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yang Qiu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | | | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
14
|
Simultaneous Learning the Dimension and Parameter of a Statistical Model with Big Data. STATISTICS IN BIOSCIENCES 2021. [DOI: 10.1007/s12561-021-09324-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
15
|
Gene Expression Analysis through Parallel Non-Negative Matrix Factorization. COMPUTATION 2021. [DOI: 10.3390/computation9100106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Genetic expression analysis is a principal tool to explain the behavior of genes in an organism when exposed to different experimental conditions. In the state of art, many clustering algorithms have been proposed. It is overwhelming the amount of biological data whose high-dimensional structure exceeds mostly current computational architectures. The computational time and memory consumption optimization actually become decisive factors in choosing clustering algorithms. We propose a clustering algorithm based on Non-negative Matrix Factorization and K-means to reduce data dimensionality but whilst preserving the biological context and prioritizing gene selection, and it is implemented within parallel GPU-based environments through the CUDA library. A well-known dataset is used in our tests and the quality of the results is measured through the Rand and Accuracy Index. The results show an increase in the acceleration of 6.22× compared to the sequential version. The algorithm is competitive in the biological datasets analysis and it is invariant with respect to the classes number and the size of the gene expression matrix.
Collapse
|
16
|
Shiga M, Seno S, Onizuka M, Matsuda H. SC-JNMF: single-cell clustering integrating multiple quantification methods based on joint non-negative matrix factorization. PeerJ 2021; 9:e12087. [PMID: 34532161 PMCID: PMC8404576 DOI: 10.7717/peerj.12087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 08/07/2021] [Indexed: 11/20/2022] Open
Abstract
Single-cell RNA-sequencing is a rapidly evolving technology that enables us to understand biological processes at unprecedented resolution. Single-cell expression analysis requires a complex data processing pipeline, and the pipeline is divided into two main parts: The quantification part, which converts the sequence information into gene-cell matrix data; the analysis part, which analyzes the matrix data using statistics and/or machine learning techniques. In the analysis part, unsupervised cell clustering plays an important role in identifying cell types and discovering cell diversity and subpopulations. Identified cell clusters are also used for subsequent analysis, such as finding differentially expressed genes and inferring cell trajectories. However, single-cell clustering using gene expression profiles shows different results depending on the quantification methods. Clustering results are greatly affected by the quantification method used in the upstream process. In other words, even if the original RNA-sequence data is the same, gene expression profiles processed by different quantification methods will produce different clusters. In this article, we propose a robust and highly accurate clustering method based on joint non-negative matrix factorization (joint-NMF) by utilizing the information from multiple gene expression profiles quantified using different methods from the same RNA-sequence data. Our joint-NMF can extract common factors among multiple gene expression profiles by applying each NMF under the constraint that one of the factorized matrices is shared among multiple NMFs. The joint-NMF determines more robust and accurate cell clustering results by leveraging multiple quantification methods compared to conventional clustering methods, which use only a single gene expression profile. Additionally, we showed the usefulness of discovering marker genes with the extracted features using our method.
Collapse
Affiliation(s)
- Mikio Shiga
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Shigeto Seno
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Makoto Onizuka
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Hideo Matsuda
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| |
Collapse
|
17
|
He B, Xiao Y, Liang H, Huang Q, Du Y, Li Y, Garmire D, Sun D, Garmire LX. ASGARD: A Single-cell Guided pipeline to Aid Repurposing of Drugs. ARXIV 2021:2109.06377. [PMID: 34545335 PMCID: PMC8452105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Revised: 12/22/2022] [Indexed: 01/04/2023]
Abstract
Intercellular heterogeneity is a major obstacle to successful precision medicine. Single-cell RNA sequencing (scRNA-seq) technology has enabled in-depth analysis of intercellular heterogeneity in various diseases. However, its full potential for precision medicine has yet to be reached. Towards this, we propose a new drug recommendation system called: A Single-cell Guided Pipeline to Aid Repurposing of Drugs (ASGARD). ASGARD defines a novel drug score predicting drugs by considering all cell clusters to address the intercellular heterogeneity within each patient. We tested ASGARD on multiple diseases, including breast cancer, acute lymphoblastic leukemia, and coronavirus disease 2019 (COVID-19). On single-drug therapy, ASGARD shows significantly better average accuracy (AUC of 0.92) compared to two other bulk-cell-based drug repurposing methods (AUC of 0.80 and 0.76). It is also considerably better (AUC of 0.82) than other cell cluster level predicting methods (AUC of 0.67 and 0.55). In addition, ASGARD is also validated by the drug response prediction method TRANSACT with Triple-Negative-Breast-Cancer patient samples. Many top-ranked drugs are either approved by FDA or in clinical trials treating corresponding diseases. In silico cell-type specific drop-out experiments using triple-negative breast cancers show the importance of T cells in the tumor microenvironment in affecting drug predictions. In conclusion, ASGARD is a promising drug repurposing recommendation tool guided by single-cell RNA-seq for personalized medicine. ASGARD is free for educational use at https://github.com/lanagarmire/ASGARD.
Collapse
Affiliation(s)
- Bing He
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
| | - Yao Xiao
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
| | - Haodong Liang
- Department of Statistics, College of Literature, Science, and the Arts, University of Michigan, Ann Arbor, MI, USA
| | - Qianhui Huang
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
| | - Yuheng Du
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
| | - Yijun Li
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
| | - David Garmire
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Duxin Sun
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, Ann Arbor, MI, USA
| | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, Medical School, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
18
|
Jiao CN, Liu JX, Wang J, Shang J, Zheng CH. Visualization and Analysis of Single cell RNA-seq Data by Maximizing Correntropy based Non-negative Low Rank Representation. IEEE J Biomed Health Inform 2021; 26:1872-1882. [PMID: 34495855 DOI: 10.1109/jbhi.2021.3110766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The exploration of single cell RNA-sequencing (scRNA-seq) technology generates a new perspective to analyze biological problems. One of the major applications of scRNA-seq data is to discover subtypes of cells by cell clustering. Nevertheless, it is challengeable for traditional methods to handle scRNA-seq data with high level of technical noise and notorious dropouts. To better analyze single cell data, a novel scRNA-seq data analysis model called Maximum correntropy criterion based Non-negative and Low Rank Representation (MccNLRR) is introduced. Specifically, the maximum correntropy criterion, as an effective loss function, is more robust to the high noise and large outliers existed in the data. Moreover, the low rank representation is proven to be a powerful tool for capturing the global and local structures of data. Therefore, some important information, such as the similarity of cells in the subspace, is also extracted by it. Then, an iterative algorithm on the basis of the half-quadratic optimization and alternating direction method is developed to settle the complex optimization problem. Before the experiment, we also analyze the convergence and robustness of MccNLRR. At last, the results of cell clustering, visualization analysis, and gene markers selection on scRNA-seq data reveal that MccNLRR method can distinguish cell subtypes accurately and robustly.
Collapse
|
19
|
Davis-Marcisak EF, Deshpande A, Stein-O'Brien GL, Ho WJ, Laheru D, Jaffee EM, Fertig EJ, Kagohara LT. From bench to bedside: Single-cell analysis for cancer immunotherapy. Cancer Cell 2021; 39:1062-1080. [PMID: 34329587 PMCID: PMC8406623 DOI: 10.1016/j.ccell.2021.07.004] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 06/16/2021] [Accepted: 07/02/2021] [Indexed: 01/04/2023]
Abstract
Single-cell technologies are emerging as powerful tools for cancer research. These technologies characterize the molecular state of each cell within a tumor, enabling new exploration of tumor heterogeneity, microenvironment cell-type composition, and cell state transitions that affect therapeutic response, particularly in the context of immunotherapy. Analyzing clinical samples has great promise for precision medicine but is technically challenging. Successfully identifying predictors of response requires well-coordinated, multi-disciplinary teams to ensure adequate sample processing for high-quality data generation and computational analysis for data interpretation. Here, we review current approaches to sample processing and computational analysis regarding their application to translational cancer immunotherapy research.
Collapse
Affiliation(s)
- Emily F Davis-Marcisak
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, 550 N Broadway, Suite 1101E, Baltimore, MD 21205, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Atul Deshpande
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Genevieve L Stein-O'Brien
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, 550 N Broadway, Suite 1101E, Baltimore, MD 21205, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Won J Ho
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Daniel Laheru
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Elizabeth M Jaffee
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Elana J Fertig
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, 550 N Broadway, Suite 1101E, Baltimore, MD 21205, USA; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, MD, USA; Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Luciane T Kagohara
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, 1650 Orleans Street, Room 485, Baltimore, MD 21287, USA; Convergence Institute, Johns Hopkins University, Baltimore, MD, USA; Bloomberg-Kimmel Immunotherapy Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
20
|
Song D, Li K, Hemminger Z, Wollman R, Li JJ. scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling. Bioinformatics 2021; 37:i358-i366. [PMID: 34252925 PMCID: PMC8275345 DOI: 10.1093/bioinformatics/btab273] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Results Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data. Availability and implementation The R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA 90095-7246, USA
| | - Kexin Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Zachary Hemminger
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA 90095, USA.,Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095-7239, USA
| | - Roy Wollman
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA 90095, USA.,Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095-7239, USA.,Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095-1569, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.,Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA.,Department of Computational Medicine, University of California, Los Angeles, CA 90095-1766, USA.,Department of Biostatistics, University of California Los Angeles, CA 90095-1772, USA
| |
Collapse
|
21
|
Zhu YL, Yuan SS, Liu JX. Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization for Single-Cell RNA-seq Analysis. Interdiscip Sci 2021; 14:45-54. [PMID: 34231183 DOI: 10.1007/s12539-021-00457-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/24/2021] [Accepted: 06/27/2021] [Indexed: 10/20/2022]
Abstract
In traditional sequencing techniques, the different functions of cells and the different roles they play in differentiation are often ignored. With the advancement of single-cell RNA sequencing (scRNA-seq) techniques, scientists can measure the gene expression value at the single-cell level, and it is helping to understand the heterogeneity hidden in cells. One of the most powerful ways to find heterogeneity is using the unsupervised clustering method to get separate subpopulations. In this paper, we propose a novel clustering method Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization (SDCNMF) that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations. SDCNMF both considers the similarity of closer cells and the dissimilarity of cells that are farther away. It can not only keep the similar cells getting closer in low-dimensional space, but also can push the dissimilar cells away from each other. We test the validity of our proposed method on five scRNA-seq datasets. Clustering results show that SDCNMF is better than other comparative methods, and the gene markers we find are also consistent with previous studies. Therefore, we can conclude that SDCNMF is effective in scRNA-seq data analysis. This paper proposes a novel clustering method Similarity and Dissimilarity Regularized Nonnegative Matrix Factorization (SDCNMF) that simultaneously impose similarity and dissimilarity constraints on low-dimensional representations. SDCNMF both considers the similarity of closer cells and the dissimilarity of cells that are farther away. It can not only keep the similar cells getting closer in low-dimensional space, but also can push the dissimilar cells away from each other. Clustering results show that SDCNMF is better than other comparative methods, and the gene markers we find are also consistent with previous studies.
Collapse
Affiliation(s)
- Ya-Li Zhu
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Sha-Sha Yuan
- School of Computer Science, Qufu Normal University, Rizhao, China.
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, China.,Rizhao Huilian Zhongchuang Institute of Intelligent Technology, Rizhao, 276826, China
| |
Collapse
|
22
|
Kharchenko PV. The triumphs and limitations of computational methods for scRNA-seq. Nat Methods 2021; 18:723-732. [PMID: 34155396 DOI: 10.1038/s41592-021-01171-x] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 04/29/2021] [Indexed: 02/05/2023]
Abstract
The rapid progress of protocols for sequencing single-cell transcriptomes over the past decade has been accompanied by equally impressive advances in the computational methods for analysis of such data. As capacity and accuracy of the experimental techniques grew, the emerging algorithm developments revealed increasingly complex facets of the underlying biology, from cell type composition to gene regulation to developmental dynamics. At the same time, rapid growth has forced continuous reevaluation of the underlying statistical models, experimental aims, and sheer volumes of data processing that are handled by these computational tools. Here, I review key computational steps of single-cell RNA sequencing (scRNA-seq) analysis, examine assumptions made by different approaches, and highlight successes, remaining ambiguities, and limitations that are important to keep in mind as scRNA-seq becomes a mainstream technique for studying biology.
Collapse
Affiliation(s)
- Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
23
|
Huang Q, Liu Y, Du Y, Garmire LX. Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2021; 19:267-281. [PMID: 33359678 PMCID: PMC8602772 DOI: 10.1016/j.gpb.2020.07.004] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 07/16/2020] [Accepted: 10/27/2020] [Indexed: 01/13/2023]
Abstract
Annotating cell types is a critical step in single-cell RNA sequencing (scRNA-seq) data analysis. Some supervised or semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-seq analysis. In this study, we evaluated ten cell type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single-cell research, including Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, and SCINA. The other two methods were repurposed from deconvoluting DNA methylation data, i.e., linear constrained projection (CP) and robust partial correlations (RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions; the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased cell type classes; as well as the detection of rare and unknown cell types. Overall, methods such as Seurat, SingleR, CP, RPC, and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Additionally, Seurat, SingleR, CP, and RPC were more robust against downsampling. However, Seurat did have a major drawback at predicting rare cell populations, and it was suboptimal at differentiating cell types highly similar to each other, compared to SingleR and RPC. All the code and data are available from https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.
Collapse
Affiliation(s)
- Qianhui Huang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yu Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48105, USA
| | - Yuheng Du
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48105, USA.
| |
Collapse
|
24
|
Chen F, Ding K, Priedigkeit N, Elangovan A, Levine KM, Carleton N, Savariau L, Atkinson JM, Oesterreich S, Lee AV. Single-Cell Transcriptomic Heterogeneity in Invasive Ductal and Lobular Breast Cancer Cells. Cancer Res 2021; 81:268-281. [PMID: 33148662 PMCID: PMC7856056 DOI: 10.1158/0008-5472.can-20-0696] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 07/14/2020] [Accepted: 10/29/2020] [Indexed: 11/16/2022]
Abstract
Invasive lobular breast carcinoma (ILC), one of the major breast cancer histologic subtypes, exhibits unique features compared with the well-studied ductal cancer subtype (IDC). The pathognomonic feature of ILC is loss of E-cadherin, mainly caused by inactivating mutations, but the contribution of this genetic alteration to ILC-specific molecular characteristics remains largely understudied. To profile these features transcriptionally, we conducted single-cell RNA sequencing on a panel of IDC and ILC cell lines, and an IDC cell line (T47D) with CRISPR-Cas9-mediated E-cadherin knockout (KO). Inspection of intracell line heterogeneity illustrated genetically and transcriptionally distinct subpopulations in multiple cell lines and highlighted rare populations of MCF7 cells highly expressing an apoptosis-related signature, positively correlated with a preadaptation signature to estrogen deprivation. Investigation of E-cadherin KO-induced alterations showed transcriptomic membranous systems remodeling, elevated resemblance to ILCs in regulon activation, and increased sensitivity to IFNγ-mediated growth inhibition via activation of IRF1. This study reveals single-cell transcriptional heterogeneity in breast cancer cell lines and provides a resource to identify drivers of cancer progression and drug resistance. SIGNIFICANCE: This study represents a key step towards understanding heterogeneity in cancer cell lines and the role of E-cadherin depletion in contributing to the molecular features of invasive lobular breast carcinoma.
Collapse
MESH Headings
- Antigens, CD/genetics
- Antigens, CD/metabolism
- Biomarkers, Tumor/genetics
- Breast Neoplasms/genetics
- Breast Neoplasms/pathology
- Cadherins/antagonists & inhibitors
- Cadherins/genetics
- Cadherins/metabolism
- Carcinoma, Ductal, Breast/genetics
- Carcinoma, Ductal, Breast/pathology
- Carcinoma, Lobular/genetics
- Carcinoma, Lobular/pathology
- Female
- Gene Expression Regulation, Neoplastic
- Humans
- Mutation
- Prognosis
- Single-Cell Analysis/methods
- Transcriptome
- Tumor Cells, Cultured
Collapse
Affiliation(s)
- Fangyuan Chen
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
- School of Medicine, Tsinghua University, Beijing, China
| | - Kai Ding
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
- Integrative Systems Biology Program, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Nolan Priedigkeit
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Ashuvinee Elangovan
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Kevin M Levine
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
- Department of Pathology, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Neil Carleton
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Laura Savariau
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
- Department of Human Genetics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania
| | - Jennifer M Atkinson
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
| | - Steffi Oesterreich
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| | - Adrian V Lee
- Women's Cancer Research Center, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, Pennsylvania.
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania
| |
Collapse
|
25
|
Wu W, Ma X. Joint learning dimension reduction and clustering of single-cell RNA-sequencing data. Bioinformatics 2020; 36:3825-3832. [PMID: 32246821 DOI: 10.1093/bioinformatics/btaa231] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 03/08/2020] [Accepted: 03/31/2020] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Single-cell RNA-sequencing (scRNA-seq) profiles transcriptome of individual cells, which enables the discovery of cell types or subtypes by using unsupervised clustering. Current algorithms perform dimension reduction before cell clustering because of noises, high-dimensionality and linear inseparability of scRNA-seq data. However, independence of dimension reduction and clustering fails to fully characterize patterns in data, resulting in an undesirable performance. RESULTS In this study, we propose a flexible and accurate algorithm for scRNA-seq data by jointly learning dimension reduction and cell clustering (aka DRjCC), where dimension reduction is performed by projected matrix decomposition and cell type clustering by non-negative matrix factorization. We first formulate joint learning of dimension reduction and cell clustering into a constrained optimization problem and then derive the optimization rules. The advantage of DRjCC is that feature selection in dimension reduction is guided by cell clustering, significantly improving the performance of cell type discovery. Eleven scRNA-seq datasets are adopted to validate the performance of algorithms, where the number of single cells varies from 49 to 68 579 with the number of cell types ranging from 3 to 14. The experimental results demonstrate that DRjCC significantly outperforms 13 state-of-the-art methods in terms of various measurements on cell type clustering (on average 17.44% by improvement). Furthermore, DRjCC is efficient and robust across different scRNA-seq datasets from various tissues. The proposed model and methods provide an effective strategy to analyze scRNA-seq data. AVAILABILITY AND IMPLEMENTATION The software is coded using matlab, and is free available for academic https://github.com/xkmaxidian/DRjCC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenming Wu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
26
|
Liang L, Zhu K, Lu S. BEM: Mining Coregulation Patterns in Transcriptomics via Boolean Matrix Factorization. Bioinformatics 2020; 36:4030-4037. [PMID: 31913438 DOI: 10.1093/bioinformatics/btz977] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 11/21/2019] [Accepted: 01/02/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The matrix factorization is an important way to analyze coregulation patterns in transcriptomic data, which can reveal the tumor signal perturbation status and subtype classification. However, current matrix factorization methods do not provide clear bicluster structure. Furthermore, these algorithms are based on the assumption of linear combination, which may not be sufficient to capture the coregulation patterns. RESULTS We presented a new algorithm for Boolean matrix factorization (BMF) via expectation maximization (BEM). BEM is more aligned with the molecular mechanism of transcriptomic coregulation and can scale to matrix with over 100 million data points. Synthetic experiments showed that BEM outperformed other BMF methods in terms of reconstruction error. Real-world application demonstrated that BEM is applicable to all kinds of transcriptomic data, including bulk RNA-seq, single-cell RNA-seq and spatial transcriptomic datasets. Given appropriate binarization, BEM was able to extract coregulation patterns consistent with disease subtypes, cell types or spatial anatomy. AVAILABILITY AND IMPLEMENTATION Python source code of BEM is available on https://github.com/LifanLiang/EM_BMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lifan Liang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206-3701, USA
| | - Kunju Zhu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206-3701, USA.,Department of Central Lab., Clinical Medicine Research Institute, Jinan University, Guangzhou, Guangdong 51063, China
| | - Songjian Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206-3701, USA
| |
Collapse
|
27
|
Hess M, Hackenberg M, Binder H. Exploring generative deep learning for omics data using log-linear models. Bioinformatics 2020; 36:5045-5053. [PMID: 32647888 PMCID: PMC7755415 DOI: 10.1093/bioinformatics/btaa623] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 06/28/2020] [Accepted: 07/02/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Following many successful applications to image data, deep learning is now also increasingly considered for omics data. In particular, generative deep learning not only provides competitive prediction performance, but also allows for uncovering structure by generating synthetic samples. However, exploration and visualization is not as straightforward as with image applications. RESULTS We demonstrate how log-linear models, fitted to the generated, synthetic data can be used to extract patterns from omics data, learned by deep generative techniques. Specifically, interactions between latent representations learned by the approaches and generated synthetic data are used to determine sets of joint patterns. Distances of patterns with respect to the distribution of latent representations are then visualized in low-dimensional coordinate systems, e.g. for monitoring training progress. This is illustrated with simulated data and subsequently with cortical single-cell gene expression data. Using different kinds of deep generative techniques, specifically variational autoencoders and deep Boltzmann machines, the proposed approach highlights how the techniques uncover underlying structure. It facilitates the real-world use of such generative deep learning techniques to gain biological insights from omics data. AVAILABILITY AND IMPLEMENTATION The code for the approach as well as an accompanying Jupyter notebook, which illustrates the application of our approach, is available via the GitHub repository: https://github.com/ssehztirom/Exploring-generative-deep-learning-for-omics-data-by-using-log-linear-models. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Moritz Hess
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, 79104 Freiburg, Germany
| | - Maren Hackenberg
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, 79104 Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center - University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
28
|
Dong B, Miao J, Wang Y, Luo W, Ji Z, Lai H, Zhang M, Cheng X, Wang J, Fang Y, Zhu HH, Chua CW, Fan L, Zhu Y, Pan J, Wang J, Xue W, Gao WQ. Single-cell analysis supports a luminal-neuroendocrine transdifferentiation in human prostate cancer. Commun Biol 2020; 3:778. [PMID: 33328604 PMCID: PMC7745034 DOI: 10.1038/s42003-020-01476-1] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 10/28/2020] [Indexed: 12/11/2022] Open
Abstract
Neuroendocrine prostate cancer is one of the most aggressive subtypes of prostate tumor. Although much progress has been made in understanding the development of neuroendocrine prostate cancer, the cellular architecture associated with neuroendocrine differentiation in human prostate cancer remain incompletely understood. Here, we use single-cell RNA sequencing to profile the transcriptomes of 21,292 cells from needle biopsies of 6 castration-resistant prostate cancers. Our analyses reveal that all neuroendocrine tumor cells display a luminal-like epithelial phenotype. In particular, lineage trajectory analysis suggests that focal neuroendocrine differentiation exclusively originate from luminal-like malignant cells rather than basal compartment. Further tissue microarray analysis validates the generality of the luminal phenotype of neuroendocrine cells. Moreover, we uncover neuroendocrine differentiation-associated gene signatures that may help us to further explore other intrinsic molecular mechanisms deriving neuroendocrine prostate cancer. In summary, our single-cell study provides direct evidence into the cellular states underlying neuroendocrine transdifferentiation in human prostate cancer.
Collapse
Affiliation(s)
- Baijun Dong
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Juju Miao
- State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China.,School of Biomedical Engineering and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Yanqing Wang
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Wenqin Luo
- State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Zhongzhong Ji
- State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Huadong Lai
- State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China.,School of Biomedical Engineering and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Man Zhang
- State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China.,School of Biomedical Engineering and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Xiaomu Cheng
- State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China.,School of Biomedical Engineering and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Jinming Wang
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Yuxiang Fang
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China.,State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Helen He Zhu
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China.,State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Chee Wai Chua
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China.,State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Liancheng Fan
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Yinjie Zhu
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Jiahua Pan
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Jia Wang
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China. .,State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China.
| | - Wei Xue
- Department of Urology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China.
| | - Wei-Qiang Gao
- State Key Laboratory of Oncogenes and Related Genes, Renji-Med-X Stem Cell Research Center, Department of Urology, Ren Ji Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200127, China. .,School of Biomedical Engineering and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, China.
| |
Collapse
|
29
|
Svensson V, Gayoso A, Yosef N, Pachter L. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics 2020; 36:3418-3421. [PMID: 32176273 PMCID: PMC7267837 DOI: 10.1093/bioinformatics/btaa169] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 02/03/2020] [Accepted: 03/13/2020] [Indexed: 12/20/2022] Open
Abstract
Motivation Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. Results We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications. Availability and implementation The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/. Contact v@nxn.se Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Valentine Svensson
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | | | - Nir Yosef
- Center for Computational Biology.,Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 91125, USA.,Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA.,Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
30
|
Yang KY, Ku M, Lui KO. Single-cell transcriptomics uncover distinct innate and adaptive cell subsets during tissue homeostasis and regeneration. J Leukoc Biol 2020; 108:1593-1602. [PMID: 33070367 DOI: 10.1002/jlb.6mr0720-131r] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 07/30/2020] [Accepted: 08/10/2020] [Indexed: 02/06/2023] Open
Abstract
Recently, immune cell-mediated tissue repair and regeneration has been an emerging paradigm of regenerative medicine. Immune cells form an essential part of the wound as induction of inflammation is a necessary step to elicit tissue healing. Rapid progress in transcriptomic analyses by high-throughput next-generation sequencing has been developed to study gene regulatory network and establish molecular signatures of immune cells that could potentially predict their functional roles in tissue repair and regeneration. However, the identification of cellular heterogeneity especially on the rare cell subsets has been limited in transcriptomic analyses of bulk cell populations. Therefore, genome-wide, single-cell RNA sequencing (scRNA-Seq) has offered an unprecedented approach to unravel cellular diversity and to study novel immune cell populations involved in tissue repair and regeneration through unsupervised sampling of individual cells without the need to rely on prior knowledge about cell-specific markers. The analysis of gene expression patterns at a single-cell resolution also holds promises to uncover the mechanisms and therefore the development of therapeutic strategy promoting immunoregenerative medicine. In this review, we will discuss how scRNA-Seq facilitates the characterization of immune cells, including macrophages, innate lymphoid cells and T and B lymphocytes, discovery of immune cell heterogeneity, identification of novel subsets, and tracking of developmental trajectories of distinct immune cells during tissue homeostasis, repair, and regeneration.
Collapse
Affiliation(s)
- Kevin Y Yang
- Department of Chemical Pathology, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong, China
| | - Manching Ku
- Division of Pediatric Hematology and Oncology, Department of Pediatrics and Adolescent Medicine, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Kathy O Lui
- Department of Chemical Pathology, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong, China.,Li Ka Shing Institute of Health Sciences, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
31
|
Sherman TD, Gao T, Fertig EJ. CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures. BMC Bioinformatics 2020; 21:453. [PMID: 33054706 PMCID: PMC7556974 DOI: 10.1186/s12859-020-03796-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 10/01/2020] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. RESULTS We developed a new software framework for parallel matrix factorization in Version 3 of the CoGAPS R/Bioconductor package to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This parallelization framework provides asynchronous updates for sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. CONCLUSIONS Altogether our new software enhance the efficiency of the CoGAPS Bayesian matrix factorization algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets.
Collapse
Affiliation(s)
- Thomas D Sherman
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Tiger Gao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
32
|
Li X, Wong KC. Single-Cell RNA Sequencing Data Interpretation by Evolutionary Multiobjective Clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1773-1784. [PMID: 30908236 DOI: 10.1109/tcbb.2019.2906601] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In recent years, single-cell RNA sequencing reveals diverse cell genetics at unprecedented resolutions. Such technological advances enable researchers to uncover the functionally distinct cell subtypes such as hematopoietic stem cell subpopulation identification. However, most of the related algorithms have been hindered by the high-dimensionality and sparse nature of single-cell RNA sequencing (RNA-seq) data. To address those problems, we propose a multiobjective evolutionary clustering based on adaptive non-negative matrix factorization (MCANMF) for multiobjective single-cell RNA-seq data clustering. First, adaptive non-negative matrix factorization is proposed to decompose data for feature extraction. After that, a multiobjective clustering algorithm based on learning vector quantization is proposed to analyze single-cell RNA-seq data. To validate the effectiveness of MCANMF, we benchmark MCANMF against 15 state-of-the-art methods including seven feature extraction methods, seven clustering methods, and the kernel-based similarity learning method on six published single-cell RNA sequencing datasets comprehensively. When compared with those 15 state-of-the-art methods, MCANMF performs better than the others on those single-cell RNA sequencing datasets according to multiple evaluation metrics. Moreover, the MCANMF component analysis, time complexity analysis, and parameter analysis are conducted to demonstrate various properties of our proposed algorithm.
Collapse
|
33
|
Stein-O'Brien GL, Clark BS, Sherman T, Zibetti C, Hu Q, Sealfon R, Liu S, Qian J, Colantuoni C, Blackshaw S, Goff LA, Fertig EJ. Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species. Cell Syst 2020; 8:395-411.e8. [PMID: 31121116 DOI: 10.1016/j.cels.2019.04.004] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 01/24/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023]
Abstract
Analysis of gene expression in single cells allows for decomposition of cellular states as low-dimensional latent spaces. However, the interpretation and validation of these spaces remains a challenge. Here, we present scCoGAPS, which defines latent spaces from a source single-cell RNA-sequencing (scRNA-seq) dataset, and projectR, which evaluates these latent spaces in independent target datasets via transfer learning. Application of developing mouse retina to scRNA-Seq reveals intrinsic relationships across biological contexts and assays while avoiding batch effects and other technical features. We compare the dimensions learned in this source dataset to adult mouse retina, a time-course of human retinal development, select scRNA-seq datasets from developing brain, chromatin accessibility data, and a murine-cell type atlas to identify shared biological features. These tools lay the groundwork for exploratory analysis of scRNA-seq data via latent space representations, enabling a shift in how we compare and identify cells beyond reliance on marker genes or ensemble molecular identity.
Collapse
Affiliation(s)
- Genevieve L Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA; McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA; Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA
| | - Brian S Clark
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
| | - Thomas Sherman
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Cristina Zibetti
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA
| | - Qiwen Hu
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Sheng Liu
- Department of Ophthalmology, Johns Hopkins University, Baltimore, MD, USA
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University, Baltimore, MD, USA
| | - Carlo Colantuoni
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA; Department of Neurology, Johns Hopkins University, Baltimore, MD, USA
| | - Seth Blackshaw
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA; Kavli Neurodiscovery Institute, Johns Hopkins University, Baltimore, MD, USA; Department of Neurology, Johns Hopkins University, Baltimore, MD, USA; Department of Ophthalmology, Johns Hopkins University, Baltimore, MD, USA; Center for Human Systems Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Loyal A Goff
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA; Kavli Neurodiscovery Institute, Johns Hopkins University, Baltimore, MD, USA; McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA; McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA; Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA; Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA; Mathematical Institute for Data Science, Johns Hopkins University, Baltimore, MD, USA; Institute for Cell Engineering, Johns Hopkins University, Baltimore, MD, USA; Department of Biomedical Engineering and Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
34
|
Zheng R, Liang Z, Chen X, Tian Y, Cao C, Li M. An Adaptive Sparse Subspace Clustering for Cell Type Identification. Front Genet 2020; 11:407. [PMID: 32425984 PMCID: PMC7212354 DOI: 10.3389/fgene.2020.00407] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 03/31/2020] [Indexed: 01/04/2023] Open
Abstract
The rapid development of single-cell transcriptome sequencing technology has provided us with a cell-level perspective to study biological problems. Identification of cell types is one of the fundamental issues in computational analysis of single-cell data. Due to the large amount of noise from single-cell technologies and high dimension of expression profiles, traditional clustering methods are not so applicable to solve it. To address the problem, we have designed an adaptive sparse subspace clustering method, called AdaptiveSSC, to identify cell types. AdaptiveSSC is based on the assumption that the expression of cells with the same type lies in the same subspace; one cell can be expressed as a linear combination of the other cells. Moreover, it uses a data-driven adaptive sparse constraint to construct the similarity matrix. The comparison results of 10 scRNA-seq datasets show that AdaptiveSSC outperforms original subspace clustering and other state-of-art methods in most cases. Moreover, the learned similarity matrix can also be integrated with a modified t-SNE to obtain an improved visualization result.
Collapse
Affiliation(s)
- Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhenlan Liang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiang Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yu Tian
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chen Cao
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
35
|
Ibarra A, Zhuang J, Zhao Y, Salathia NS, Huang V, Acosta AD, Aballi J, Toden S, Karns AP, Purnajo I, Parks JR, Guo L, Mason J, Sigal D, Nova TS, Quake SR, Nerenberg M. Non-invasive characterization of human bone marrow stimulation and reconstitution by cell-free messenger RNA sequencing. Nat Commun 2020; 11:400. [PMID: 31964864 PMCID: PMC6972916 DOI: 10.1038/s41467-019-14253-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 12/17/2019] [Indexed: 01/13/2023] Open
Abstract
Circulating cell-free mRNA (cf-mRNA) holds great promise as a non-invasive diagnostic biomarker. However, cf-mRNA composition and its potential clinical applications remain largely unexplored. Here we show, using Next Generation Sequencing-based profiling, that cf-mRNA is enriched in transcripts derived from the bone marrow compared to circulating cells. Further, longitudinal studies involving bone marrow ablation followed by hematopoietic stem cell transplantation in multiple myeloma and acute myeloid leukemia patients indicate that cf-mRNA levels reflect the transcriptional activity of bone marrow-resident hematopoietic lineages during bone marrow reconstitution. Mechanistically, stimulation of specific bone marrow cell populations in vivo using growth factor pharmacotherapy show that cf-mRNA reflects dynamic functional changes over time associated with cellular activity. Our results shed light on the biology of the circulating transcriptome and highlight the potential utility of cf-mRNA to non-invasively monitor bone marrow involved pathologies. Circulating cell-free mRNA holds great promise as a non-invasive diagnostic biomarker. Here the authors show that cell-free mRNA captures transcripts from the bone marrow and can be used to non-invasively monitor dynamic changes in bone marrow physiology.
Collapse
Affiliation(s)
- Arkaitz Ibarra
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA.
| | - Jiali Zhuang
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Yue Zhao
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Neeraj S Salathia
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Vera Huang
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Alexander D Acosta
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Jonathan Aballi
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Shusuke Toden
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Amy P Karns
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Intan Purnajo
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Julianna R Parks
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Lucy Guo
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - James Mason
- Scripps Clinic Medical Group, Scripps Green Hospital, 10666 N Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Darren Sigal
- Scripps Clinic Medical Group, Scripps Green Hospital, 10666 N Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Tina S Nova
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Stephen R Quake
- Department of Bioengineering and Department of Applied Physics, Stanford University and Chan Zuckerberg Biohub, 318 Campus Drive, Stanford, CA, 94305, USA
| | - Michael Nerenberg
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA.
| |
Collapse
|
36
|
Arisdakessian C, Poirion O, Yunits B, Zhu X, Garmire LX. DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data. Genome Biol 2019; 20:211. [PMID: 31627739 PMCID: PMC6798445 DOI: 10.1186/s13059-019-1837-6] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Accepted: 09/26/2019] [Indexed: 12/12/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) offers new opportunities to study gene expression of tens of thousands of single cells simultaneously. We present DeepImpute, a deep neural network-based imputation algorithm that uses dropout layers and loss functions to learn patterns in the data, allowing for accurate imputation. Overall, DeepImpute yields better accuracy than other six publicly available scRNA-seq imputation methods on experimental data, as measured by the mean squared error or Pearson's correlation coefficient. DeepImpute is an accurate, fast, and scalable imputation tool that is suited to handle the ever-increasing volume of scRNA-seq data, and is freely available at https://github.com/lanagarmire/DeepImpute .
Collapse
Affiliation(s)
- Cédric Arisdakessian
- Department of Information and Computer Science, University of Hawaii at Manoa, Honolulu, HI, 96816, USA
| | - Olivier Poirion
- Department of Epidemiology, University of Hawaii Cancer Center, 701 Ilalo Street, Honolulu, HI, 96813, USA
| | - Breck Yunits
- Department of Epidemiology, University of Hawaii Cancer Center, 701 Ilalo Street, Honolulu, HI, 96813, USA
| | - Xun Zhu
- Department of Epidemiology, University of Hawaii Cancer Center, 701 Ilalo Street, Honolulu, HI, 96813, USA
- Department of Molecular Biology and Bioengineering, University of Hawaii at Manoa, Honolulu, HI, 96816, USA
| | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48105, USA.
| |
Collapse
|
37
|
Potter SS. Single-cell RNA sequencing for the study of development, physiology and disease. Nat Rev Nephrol 2019; 14:479-492. [PMID: 29789704 DOI: 10.1038/s41581-018-0021-7] [Citation(s) in RCA: 299] [Impact Index Per Article: 59.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
An ongoing technological revolution is continually improving our ability to carry out very high-resolution studies of gene expression patterns. Current technology enables the global gene expression profiles of single cells to be defined, facilitating dissection of heterogeneity in cell populations that was previously hidden. In contrast to gene expression studies that use bulk RNA samples and provide only a virtual average of the diverse constituent cells, single-cell studies enable the molecular distinction of all cell types within a complex population mix, such as a tumour or developing organ. For instance, single-cell gene expression profiling has contributed to improved understanding of how histologically identical, adjacent cells make different differentiation decisions during development. Beyond development, single-cell gene expression studies have enabled the characteristics of previously known cell types to be more fully defined and facilitated the identification of novel categories of cells, contributing to improvements in our understanding of both normal and disease-related physiological processes and leading to the identification of new treatment approaches. Although limitations remain to be overcome, technology for the analysis of single-cell gene expression patterns is improving rapidly and beginning to provide a detailed atlas of the gene expression patterns of all cell types in the human body.
Collapse
Affiliation(s)
- S Steven Potter
- Division of Developmental Biology, Cincinnati Children's Medical Center, Cincinnati, OH, USA.
| |
Collapse
|
38
|
Woo J, Winterhoff BJ, Starr TK, Aliferis C, Wang J. De novo prediction of cell-type complexity in single-cell RNA-seq and tumor microenvironments. Life Sci Alliance 2019; 2:2/4/e201900443. [PMID: 31266885 PMCID: PMC6607449 DOI: 10.26508/lsa.201900443] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 06/24/2019] [Indexed: 12/30/2022] Open
Abstract
This study describes a computational method for determining statistical support to varying levels of heterogeneity provided by single-cell RNA-sequencing data with applications to tumor samples. Recent single-cell transcriptomic studies revealed new insights into cell-type heterogeneities in cellular microenvironments unavailable from bulk studies. A significant drawback of currently available algorithms is the need to use empirical parameters or rely on indirect quality measures to estimate the degree of complexity, i.e., the number of subgroups present in the sample. We fill this gap with a single-cell data analysis procedure allowing for unambiguous assessments of the depth of heterogeneity in subclonal compositions supported by data. Our approach combines nonnegative matrix factorization, which takes advantage of the sparse and nonnegative nature of single-cell RNA count data, with Bayesian model comparison enabling de novo prediction of the depth of heterogeneity. We show that the method predicts the correct number of subgroups using simulated data, primary blood mononuclear cell, and pancreatic cell data. We applied our approach to a collection of single-cell tumor samples and found two qualitatively distinct classes of cell-type heterogeneity in cancer microenvironments.
Collapse
Affiliation(s)
- Jun Woo
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.,Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| | - Boris J Winterhoff
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA.,Department of Obstetrics, Gynecology and Women's Health, University of Minnesota, Minneapolis, MN, USA
| | - Timothy K Starr
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA.,Department of Obstetrics, Gynecology and Women's Health, University of Minnesota, Minneapolis, MN, USA
| | - Constantin Aliferis
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Jinhua Wang
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA .,Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
39
|
Jung M, Wells D, Rusch J, Ahmad S, Marchini J, Myers SR, Conrad DF. Unified single-cell analysis of testis gene regulation and pathology in five mouse strains. eLife 2019; 8:e43966. [PMID: 31237565 PMCID: PMC6615865 DOI: 10.7554/elife.43966] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 06/17/2019] [Indexed: 12/13/2022] Open
Abstract
To fully exploit the potential of single-cell functional genomics in the study of development and disease, robust methods are needed to simplify the analysis of data across samples, time-points and individuals. Here we introduce a model-based factor analysis method, SDA, to analyze a novel 57,600 cell dataset from the testes of wild-type mice and mice with gonadal defects due to disruption of the genes Mlh3, Hormad1, Cul4a or Cnp. By jointly analyzing mutant and wild-type cells we decomposed our data into 46 components that identify novel meiotic gene-regulatory programs, mutant-specific pathological processes, and technical effects, and provide a framework for imputation. We identify, de novo, DNA sequence motifs associated with individual components that define temporally varying modes of gene expression control. Analysis of SDA components also led us to identify a rare population of macrophages within the seminiferous tubules of Mlh3-/- and Hormad1-/- mice, an area typically associated with immune privilege.
Collapse
Affiliation(s)
- Min Jung
- Department of GeneticsWashington University School of MedicineSt. LouisUnited States
| | - Daniel Wells
- The Wellcome Centre for Human GeneticsUniversity of OxfordOxfordUnited Kingdom
- Department of StatisticsUniversity of OxfordOxfordUnited Kingdom
| | - Jannette Rusch
- Department of GeneticsWashington University School of MedicineSt. LouisUnited States
| | - Suhaira Ahmad
- Department of GeneticsWashington University School of MedicineSt. LouisUnited States
| | - Jonathan Marchini
- The Wellcome Centre for Human GeneticsUniversity of OxfordOxfordUnited Kingdom
- Department of StatisticsUniversity of OxfordOxfordUnited Kingdom
| | - Simon R Myers
- The Wellcome Centre for Human GeneticsUniversity of OxfordOxfordUnited Kingdom
- Department of StatisticsUniversity of OxfordOxfordUnited Kingdom
| | - Donald F Conrad
- Department of GeneticsWashington University School of MedicineSt. LouisUnited States
- Division of Genetics, Oregon National Primate Research CenterOregon Health & Science UniversityPortlandUnited States
| |
Collapse
|
40
|
Sun S, Chen Y, Liu Y, Shang X. A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data. BMC SYSTEMS BIOLOGY 2019; 13:28. [PMID: 30953530 PMCID: PMC6449882 DOI: 10.1186/s12918-019-0699-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Background Single-cell RNA sequencing (scRNAseq) data always involves various unwanted variables, which would be able to mask the true signal to identify cell-types. More efficient way of dealing with this issue is to extract low dimension information from high dimensional gene expression data to represent cell-type structure. In the past two years, several powerful matrix factorization tools were developed for scRNAseq data, such as NMF, ZIFA, pCMF and ZINB-WaVE. But the existing approaches either are unable to directly model the raw count of scRNAseq data or are really time-consuming when handling a large number of cells (e.g. n>500). Results In this paper, we developed a fast and efficient count-based matrix factorization method (single-cell negative binomial matrix factorization, scNBMF) based on the TensorFlow framework to infer the low dimensional structure of cell types. To make our method scalable, we conducted a series of experiments on three public scRNAseq data sets, brain, embryonic stem, and pancreatic islet. The experimental results show that scNBMF is more powerful to detect cell types and 10 - 100 folds faster than the scRNAseq bespoke tools. Conclusions In this paper, we proposed a fast and efficient count-based matrix factorization method, scNBMF, which is more powerful for detecting cell type purposes. A series of experiments were performed on three public scRNAseq data sets. The results show that scNBMF is a more powerful tool in large-scale scRNAseq data analysis. scNBMF was implemented in R and Python, and the source code are freely available at https://github.com/sqsun.
Collapse
Affiliation(s)
- Shiquan Sun
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710129, People's Republic of China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, Shaanxi, 710129, People's Republic of China.,Centre for Multidisciplinary Convergence Computing (CMCC), School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710129, People's Republic of China.,Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yabo Chen
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710129, People's Republic of China
| | - Yang Liu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710129, People's Republic of China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710129, People's Republic of China. .,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, Shaanxi, 710129, People's Republic of China.
| |
Collapse
|
41
|
Li X, Wong KC. Elucidating Genome-Wide Protein-RNA Interactions Using Differential Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:272-282. [PMID: 29990254 DOI: 10.1109/tcbb.2017.2776224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
RNA-binding proteins (RBPs) play an important role in the post-transcriptional control of RNAs, such as splicing, polyadenylation, mRNA stabilization, mRNA localization, and translation. Thanks to the recent breakthrough, non-negative matrix factorization (NMF) has been developed to combine multiple data sources to discover non-overlapping and class-specific RNA binding patterns. However, several challenges still exist in determining the number of latent dimensions in the factorization steps. In most circumstances, it is often assumed that the number of latent dimensions (or components) is given. Such trial-and-error procedures can be tedious in practice. In order to address this problem, differential evolution algorithm is proposed as the model selection method to choose the suitable number of ranks, which can adaptively decompose the input protein-RNA data matrix into different nonnegative components. Experimental results demonstrate that the proposed algorithms can improve the factorization quality over the recent state-of-the-arts. The effectiveness of the proposed algorithms are supported by comprehensive performance benchmarking on 31 genome-wide cross-linking immunoprecipitation (CLIP) coupled with high-throughput sequencing (CLIP-seq) datasets. In addition, time complexity analysis and parameter analysis are conducted to demonstrate the robustness of the proposed methods.
Collapse
|
42
|
Li X, Zhang S, Wong KC. Single-cell RNA-seq interpretations using evolutionary multiobjective ensemble pruning. Bioinformatics 2018; 35:2809-2817. [DOI: 10.1093/bioinformatics/bty1056] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 10/31/2018] [Accepted: 12/21/2018] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
In recent years, single-cell RNA sequencing enables us to discover cell types or even subtypes. Its increasing availability provides opportunities to identify cell populations from single-cell RNA-seq data. Computational methods have been employed to reveal the gene expression variations among multiple cell populations. Unfortunately, the existing ones can suffer from realistic restrictions such as experimental noises, numerical instability, high dimensionality and computational scalability.
Results
We propose an evolutionary multiobjective ensemble pruning algorithm (EMEP) that addresses those realistic restrictions. Our EMEP algorithm first applies the unsupervised dimensionality reduction to project data from the original high dimensions to low-dimensional subspaces; basic clustering algorithms are applied in those new subspaces to generate different clustering results to form cluster ensembles. However, most of those cluster ensembles are unnecessarily bulky with the expense of extra time costs and memory consumption. To overcome that problem, EMEP is designed to dynamically select the suitable clustering results from the ensembles. Moreover, to guide the multiobjective ensemble evolution, three cluster validity indices including the overall cluster deviation, the within-cluster compactness and the number of basic partition clusters are formulated as the objective functions to unleash its cell type discovery performance using evolutionary multiobjective optimization. We applied EMEP to 55 simulated datasets and seven real single-cell RNA-seq datasets, including six single-cell RNA-seq dataset and one large-scale dataset with 3005 cells and 4412 genes. Two case studies are also conducted to reveal mechanistic insights into the biological relevance of EMEP. We found that EMEP can achieve superior performance over the other clustering algorithms, demonstrating that EMEP can identify cell populations clearly.
Availability and implementation
EMEP is written in Matlab and available at https://github.com/lixt314/EMEP
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiangtao Li
- School of Computer Science and Information Technology, Northeast Normal University, Changchun, Jilin, China
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Shixiong Zhang
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR
| |
Collapse
|
43
|
Lee D, Cheng A, Lawlor N, Bolisetty M, Ucar D. Detection of correlated hidden factors from single cell transcriptomes using Iteratively Adjusted-SVA (IA-SVA). Sci Rep 2018; 8:17040. [PMID: 30451954 PMCID: PMC6242813 DOI: 10.1038/s41598-018-35365-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 11/01/2018] [Indexed: 01/01/2023] Open
Abstract
Single cell RNA-sequencing (scRNA-seq) precisely characterizes gene expression levels and dissects variation in expression associated with the state (technical or biological) and the type of the cell, which is averaged out in bulk measurements. Multiple and correlated sources contribute to gene expression variation in single cells, which makes their estimation difficult with the existing methods developed for batch correction (e.g., surrogate variable analysis (SVA)) that estimate orthogonal transformations of these sources. We developed iteratively adjusted surrogate variable analysis (IA-SVA) that can estimate hidden factors even when they are correlated with other sources of variation by identifying a set of genes associated with each hidden factor in an iterative manner. Analysis of scRNA-seq data from human cells showed that IA-SVA could accurately capture hidden variation arising from technical (e.g., stacked doublet cells) or biological sources (e.g., cell type or cell-cycle stage). Furthermore, IA-SVA delivers a set of genes associated with the detected hidden source to be used in downstream data analyses. As a proof of concept, IA-SVA recapitulated known marker genes for islet cell subsets (e.g., alpha, beta), which improved the grouping of subsets into distinct clusters. Taken together, IA-SVA is an effective and novel method to dissect multiple and correlated sources of variation in scRNA-seq data.
Collapse
Affiliation(s)
- Donghyung Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, 06032, CT, USA.
| | - Anthony Cheng
- The Jackson Laboratory for Genomic Medicine, Farmington, 06032, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, 06030, CT, USA
| | - Nathan Lawlor
- The Jackson Laboratory for Genomic Medicine, Farmington, 06032, CT, USA
| | | | - Duygu Ucar
- The Jackson Laboratory for Genomic Medicine, Farmington, 06032, CT, USA.
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, 06030, CT, USA.
- Institute of Systems Genomics, University of Connecticut Health Center, Farmington, 06030, CT, USA.
| |
Collapse
|
44
|
Stein-O'Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF, Xu Y, Fertig EJ. Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet 2018; 34:790-805. [PMID: 30143323 PMCID: PMC6309559 DOI: 10.1016/j.tig.2018.07.003] [Citation(s) in RCA: 100] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 06/01/2018] [Accepted: 07/16/2018] [Indexed: 12/20/2022]
Abstract
Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.
Collapse
Affiliation(s)
- Genevieve L Stein-O'Brien
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Raman Arora
- Department of Computer Science, Institute for Data Intensive Engineering and Science, Johns Hopkins University, Baltimore, MD, USA
| | - Aedin C Culhane
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Alexander V Favorov
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA; Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, PA, USA; Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, PA, USA
| | - Loyal A Goff
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Yifeng Li
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON, Canada
| | - Aloune Ngom
- School of Computer Science, University of Windsor, Windsor, ON, Canada
| | - Michael F Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Elana J Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
45
|
Wei R, Ross AB, Su M, Wang J, Guiraud SP, Draper CF, Beaumont M, Jia W, Martin FP. Metabotypes Related to Meat and Vegetable Intake Reflect Microbial, Lipid and Amino Acid Metabolism in Healthy People. Mol Nutr Food Res 2018; 62:e1800583. [PMID: 30098305 DOI: 10.1002/mnfr.201800583] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 07/25/2018] [Indexed: 01/05/2023]
Abstract
SCOPE The objective of this study is to develop a new methodology to identify the relationship between dietary patterns and metabolites indicative of food intake and metabolism. METHODS AND RESULTS Plasma and urine samples from healthy Swiss subjects (n = 89) collected over two time points are analyzed for a panel of host-microbial metabolites using GC- and LC-MS. Dietary intake is evaluated using a validated food frequency questionnaire. Dietary pattern clusters and relationships with metabolites are determined using Non-Negative Matrix Factorization (NNMF) and Sparse Generalized Canonical Correlation Analysis (SGCCA). Use of NNMF allows detection of latent diet clusters in this population, which describes a high intake of meat or vegetables. SGCCA associates these clusters to i) diet-host microbial and lipid associated bile acid metabolism, and ii) essential amino acid metabolism. CONCLUSION This novel application of NNMF and SGCCA allows detection of distinct metabotypes for meat and vegetable dietary patterns in a heterogeneous population. As many of the metabolites associated with meat or vegetable intake are the result of host-microbiota interactions, the findings support a role for microbiota mediating the metabolic imprinting of different dietary choices.
Collapse
Affiliation(s)
- Runmin Wei
- University of Hawaii Cancer Center (UHCC), Honolulu, HI, 96813, USA.,Molecular Biosciences and Bioengineering, University of Hawaii at Manoa, Honolulu, HI, 96822, USA
| | - Alastair B Ross
- Analytical Science Department, Nestlé Research Center, Lausanne, Switzerland.,Division of Food and Nutrition Science, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - MingMing Su
- University of Hawaii Cancer Center (UHCC), Honolulu, HI, 96813, USA
| | - Jingye Wang
- University of Hawaii Cancer Center (UHCC), Honolulu, HI, 96813, USA
| | - Seu-Ping Guiraud
- Nutrition and Metabolic health Department, Nestle Institute of Health Sciences (NIHS), Lausanne, Switzerland
| | - Colleen Fogarty Draper
- Nutrition and Metabolic health Department, Nestle Institute of Health Sciences (NIHS), Lausanne, Switzerland
| | - Maurice Beaumont
- Clinical Development Unit, Nestlé Research Center, Lausanne, Switzerland
| | - Wei Jia
- University of Hawaii Cancer Center (UHCC), Honolulu, HI, 96813, USA
| | - Francois-Pierre Martin
- Nutrition and Metabolic health Department, Nestle Institute of Health Sciences (NIHS), Lausanne, Switzerland
| |
Collapse
|
46
|
Hon CC, Shin JW, Carninci P, Stubbington MJT. The Human Cell Atlas: Technical approaches and challenges. Brief Funct Genomics 2018; 17:283-294. [PMID: 29092000 PMCID: PMC6063304 DOI: 10.1093/bfgp/elx029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The Human Cell Atlas is a large, international consortium that aims to identify and describe every cell type in the human body. The comprehensive cellular maps that arise from this ambitious effort have the potential to transform many aspects of fundamental biology and clinical practice. Here, we discuss the technical approaches that could be used today to generate such a resource and also the technical challenges that will be encountered.
Collapse
Affiliation(s)
- Chung-Chau Hon
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Jay W Shin
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | | |
Collapse
|
47
|
Stein-O'Brien G, Kagohara LT, Li S, Thakar M, Ranaweera R, Ozawa H, Cheng H, Considine M, Schmitz S, Favorov AV, Danilova LV, Califano JA, Izumchenko E, Gaykalova DA, Chung CH, Fertig EJ. Integrated time course omics analysis distinguishes immediate therapeutic response from acquired resistance. Genome Med 2018; 10:37. [PMID: 29792227 PMCID: PMC5966898 DOI: 10.1186/s13073-018-0545-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 05/01/2018] [Indexed: 02/06/2023] Open
Abstract
Background Targeted therapies specifically act by blocking the activity of proteins that are encoded by genes critical for tumorigenesis. However, most cancers acquire resistance and long-term disease remission is rarely observed. Understanding the time course of molecular changes responsible for the development of acquired resistance could enable optimization of patients’ treatment options. Clinically, acquired therapeutic resistance can only be studied at a single time point in resistant tumors. Methods To determine the dynamics of these molecular changes, we obtained high throughput omics data (RNA-sequencing and DNA methylation) weekly during the development of cetuximab resistance in a head and neck cancer in vitro model. The CoGAPS unsupervised algorithm was used to determine the dynamics of the molecular changes associated with resistance during the time course of resistance development. Results CoGAPS was used to quantify the evolving transcriptional and epigenetic changes. Applying a PatternMarker statistic to the results from CoGAPS enabled novel heatmap-based visualization of the dynamics in these time course omics data. We demonstrate that transcriptional changes result from immediate therapeutic response or resistance, whereas epigenetic alterations only occur with resistance. Integrated analysis demonstrates delayed onset of changes in DNA methylation relative to transcription, suggesting that resistance is stabilized epigenetically. Conclusions Genes with epigenetic alterations associated with resistance that have concordant expression changes are hypothesized to stabilize the resistant phenotype. These genes include FGFR1, which was associated with EGFR inhibitors resistance previously. Thus, integrated omics analysis distinguishes the timing of molecular drivers of resistance. This understanding of the time course progression of molecular changes in acquired resistance is important for the development of alternative treatment strategies that would introduce appropriate selection of new drugs to treat cancer before the resistant phenotype develops. Electronic supplementary material The online version of this article (10.1186/s13073-018-0545-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Genevieve Stein-O'Brien
- Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.,Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Luciane T Kagohara
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Sijia Li
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Manjusha Thakar
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Ruchira Ranaweera
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.,Department of Head and Neck-Endocrine Oncology, Moffitt Cancer Center, Tampa, FL, USA
| | - Hiroyuki Ozawa
- Department of Otorhinolaryngology-Head and Neck Surgery, Keio University School of Medicine, Tokyo, Japan
| | - Haixia Cheng
- Department of Surgery - Otolaryngology-Head and Neck Surgery, University of Utah,
- Salt Lake City, UT, USA
| | - Michael Considine
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Sandra Schmitz
- Head and Neck Surgery Unit, St Luc University Hospital, Brussels, Belgium
| | - Alexander V Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.,Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Ludmila V Danilova
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.,Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Joseph A Califano
- Department of Surgery, UC San Diego Moores Cancer Center, La Jolla, CA, USA
| | - Evgeny Izumchenko
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University, Baltimore, MD, USA
| | - Daria A Gaykalova
- Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins University, Baltimore, MD, USA
| | - Christine H Chung
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA. .,Department of Head and Neck-Endocrine Oncology, Moffitt Cancer Center, Tampa, FL, USA.
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
48
|
Ortega MA, Poirion O, Zhu X, Huang S, Wolfgruber TK, Sebra R, Garmire LX. Using single-cell multiple omics approaches to resolve tumor heterogeneity. Clin Transl Med 2017; 6:46. [PMID: 29285690 PMCID: PMC5746494 DOI: 10.1186/s40169-017-0177-y] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 12/06/2017] [Indexed: 12/31/2022] Open
Abstract
It has become increasingly clear that both normal and cancer tissues are composed of heterogeneous populations. Genetic variation can be attributed to the downstream effects of inherited mutations, environmental factors, or inaccurately resolved errors in transcription and replication. When lesions occur in regions that confer a proliferative advantage, it can support clonal expansion, subclonal variation, and neoplastic progression. In this manner, the complex heterogeneous microenvironment of a tumour promotes the likelihood of angiogenesis and metastasis. Recent advances in next-generation sequencing and computational biology have utilized single-cell applications to build deep profiles of individual cells that are otherwise masked in bulk profiling. In addition, the development of new techniques for combining single-cell multi-omic strategies is providing a more precise understanding of factors contributing to cellular identity, function, and growth. Continuing advancements in single-cell technology and computational deconvolution of data will be critical for reconstructing patient specific intra-tumour features and developing more personalized cancer treatments.
Collapse
Affiliation(s)
- Michael A. Ortega
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
| | - Olivier Poirion
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
| | - Xun Zhu
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
- Department of Molecular Biosciences and Bioengineering, Honolulu, HI USA
| | - Sijia Huang
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
- Department of Molecular Biosciences and Bioengineering, Honolulu, HI USA
| | - Thomas K. Wolfgruber
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
| | - Robert Sebra
- Icahn Institute and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Lana X. Garmire
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
- Department of Molecular Biosciences and Bioengineering, Honolulu, HI USA
| |
Collapse
|
49
|
Cho DS, Doles JD. Single cell transcriptome analysis of muscle satellite cells reveals widespread transcriptional heterogeneity. Gene 2017; 636:54-63. [PMID: 28893664 DOI: 10.1016/j.gene.2017.09.014] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 08/03/2017] [Accepted: 09/07/2017] [Indexed: 02/03/2023]
Abstract
Tissue specific stem cells are indispensable contributors to adult tissue maintenance, repair, and regeneration. In skeletal muscle, satellite cells (SCs) are the resident muscle stem cell population and are required to maintain skeletal muscle homeostasis throughout life. Increasing evidence suggests that SCs are a heterogeneous cell population with substantial biochemical and functional diversity. A major limitation in the field is an incomplete understanding of the nature and extent of this cellular heterogeneity. Single cell analyses are well suited to addressing this issue, especially when coupled to unbiased profiling paradigms such as high throughout RNA sequencing. We performed single cell RNA sequencing (scRNA-seq) on freshly isolated muscle satellite cells and found a surprising degree of heterogeneity at multiple levels, from muscle-specific transcripts to the broader SC transcriptome. We leveraged several comparative bioinformatics techniques and found that individual SCs enrich for unique transcript clusters. We propose that these gene expression "fingerprints" may contribute to observed functional SC diversity. Overall, these studies underscore the importance of several established SC signaling pathways/processes on a single cell level, implicate novel regulators of SC heterogeneity, and lay the groundwork for further investigation into SC heterogeneity in health and disease.
Collapse
Affiliation(s)
- Dong Seong Cho
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA.
| | - Jason D Doles
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA.
| |
Collapse
|