1
|
Jansma A, Yao Y, Wolfe J, Del Debbio L, Beentjes SV, Ponting CP, Khamseh A. High order expression dependencies finely resolve cryptic states and subtypes in single cell data. Mol Syst Biol 2025; 21:173-207. [PMID: 39748128 PMCID: PMC11790937 DOI: 10.1038/s44320-024-00074-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 10/24/2024] [Accepted: 10/31/2024] [Indexed: 01/04/2025] Open
Abstract
Single cells are typically typed by clustering into discrete locations in reduced dimensional transcriptome space. Here we introduce Stator, a data-driven method that identifies cell (sub)types and states without relying on cells' local proximity in transcriptome space. Stator labels the same single cell multiply, not just by type and subtype, but also by state such as activation, maturity or cell cycle sub-phase, through deriving higher-order gene expression dependencies from a sparse gene-by-cell expression matrix. Stator's finer resolution is clear from analyses of mouse embryonic brain, and human healthy or diseased liver. Rather than only coarse-scale labels of cell type, Stator further resolves cell types into subtypes, and these subtypes into stages of maturity and/or cell cycle phases, and yet further into portions of these phases. Among cryptically homogeneous embryonic cells, for example, Stator finds 34 distinct radial glia states whose gene expression forecasts their future GABAergic or glutamatergic neuronal fate. Further, Stator's fine resolution of liver cancer states reveals expression programmes that predict patient survival. We provide Stator as a Nextflow pipeline and Shiny App.
Collapse
Affiliation(s)
- Abel Jansma
- MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
- Higgs Centre for Theoretical Physics, School of Physics & Astronomy, University of Edinburgh, Edinburgh, EH9 3FD, UK
| | - Yuelin Yao
- MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK
| | - Jareth Wolfe
- MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Luigi Del Debbio
- Higgs Centre for Theoretical Physics, School of Physics & Astronomy, University of Edinburgh, Edinburgh, EH9 3FD, UK
| | - Sjoerd V Beentjes
- MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
- School of Mathematics, University of Edinburgh, Edinburgh, EH9 3FD, UK
| | - Chris P Ponting
- MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK.
| | - Ava Khamseh
- MRC Human Genetics Unit, Institute of Genetics & Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK.
- Higgs Centre for Theoretical Physics, School of Physics & Astronomy, University of Edinburgh, Edinburgh, EH9 3FD, UK.
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK.
| |
Collapse
|
2
|
Sharifi MN, Shi Y, Chrostek MR, Callahan SC, Shang T, Berg TJ, Helzer KT, Bootsma ML, Sjöström M, Josefsson A, Feng FY, Huffman LB, Schulte C, Blitzer GC, Sodji QH, Morris ZS, Ma VT, Meimetis L, Kosoff D, Taylor AK, LeBeau AM, Lang JM, Zhao SG. Clinical cell-surface targets in metastatic and primary solid cancers. JCI Insight 2024; 9:e183674. [PMID: 39315546 PMCID: PMC11457844 DOI: 10.1172/jci.insight.183674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024] Open
Abstract
Therapies against cell-surface targets (CSTs) represent an emerging treatment class in solid malignancies. However, high-throughput investigations of CST expression across cancer types have been reliant on data sets of mostly primary tumors, despite therapeutic use most commonly in metastatic disease. We identified a total of 818 clinical trials of CST therapies with 78 CSTs. We assembled a data set spanning RNA-seq and microarrays in 7,927 benign samples, 16,866 primary tumor samples, and 6,124 metastatic tumor samples. We also utilized single-cell RNA-seq data from 36 benign tissues and 558 primary and metastatic tumor samples, and matched RNA versus protein expression in 29 benign tissue samples, 1,075 tumor samples, and 942 cell lines. High RNA expression accurately predicted high protein expression across CST therapies in benign tissues, tumor samples, and cell lines. We compared metastatic versus primary tumor expression, identified potential opportunities for repositioning, and matched cell lines to tumor types based on CST and global RNA expression. We evaluated single-cell heterogeneity across tumors, and identified rare normal cell subpopulations that may contribute to toxicity. Finally, we identified combinations of CST therapies for which bispecific approaches could improve tumor specificity. This study helps better define the landscape of CST expression in metastatic and primary cancers.
Collapse
Affiliation(s)
| | - Yue Shi
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Matthew R. Chrostek
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - S. Carson Callahan
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Tianfu Shang
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Tracy J. Berg
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Kyle T. Helzer
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Matthew L. Bootsma
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Martin Sjöström
- Department of Clinical Sciences Lund, Division of Oncology, Lund University, Lund, Sweden
- Department of Hematology, Oncology and Radiation Physics, Skåne University Hospital, Lund, Sweden
| | - Andreas Josefsson
- Wallenberg Center for Molecular Medicine, Urology, Department of Diagnostics and Intervention, Umeå University, Umea, Sweden
| | - Felix Y. Feng
- Departments of Radiation Oncology, Urology, and Medicine, UCSF, San Francisco, California, USA
| | | | - Chris Schulte
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Grace C. Blitzer
- Carbone Cancer Center, and
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Quaovi H. Sodji
- Carbone Cancer Center, and
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | - Zachary S. Morris
- Carbone Cancer Center, and
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| | | | - Labros Meimetis
- Carbone Cancer Center, and
- Department of Radiology, University of Wisconsin, Madison, Wisconsin, USA
| | - David Kosoff
- Department of Medicine
- Carbone Cancer Center, and
- William S. Middleton Memorial Veterans’ Hospital, Madison, Wisconsin, USA
| | | | - Aaron M. LeBeau
- Carbone Cancer Center, and
- Department of Radiology, University of Wisconsin, Madison, Wisconsin, USA
- Department of Pathology and Laboratory Medicine, University of Wisconsin, Madison, Wisconsin, USA
| | | | - Shuang G. Zhao
- Carbone Cancer Center, and
- Department of Human Oncology, University of Wisconsin, Madison, Wisconsin, USA
| |
Collapse
|
3
|
Yu D, Li M, Linghu G, Hu Y, Hajdarovic KH, Wang A, Singh R, Webb AE. CellBiAge: Improved single-cell age classification using data binarization. Cell Rep 2023; 42:113500. [PMID: 38032797 PMCID: PMC10791072 DOI: 10.1016/j.celrep.2023.113500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 10/20/2023] [Accepted: 11/13/2023] [Indexed: 12/02/2023] Open
Abstract
Aging is a major risk factor for many diseases. Accurate methods for predicting age in specific cell types are essential to understand the heterogeneity of aging and to assess rejuvenation strategies. However, classifying organismal age at single-cell resolution using transcriptomics is challenging due to sparsity and noise. Here, we developed CellBiAge, a robust and easy-to-implement machine learning pipeline, to classify the age of single cells in the mouse brain using single-cell transcriptomics. We show that binarization of gene expression values for the top highly variable genes significantly improved test performance across different models, techniques, sexes, and brain regions, with potential age-related genes identified for model prediction. Additionally, we demonstrate CellBiAge's ability to capture exercise-induced rejuvenation in neural stem cells. This study provides a broadly applicable approach for robust classification of organismal age of single cells in the mouse brain, which may aid in understanding the aging process and evaluating rejuvenation methods.
Collapse
Affiliation(s)
- Doudou Yu
- Molecular Biology, Cell Biology, and Biochemistry Graduate Program, Brown University, Providence, RI 02912, USA; Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Manlin Li
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Guanjie Linghu
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Yihuan Hu
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | | | - An Wang
- Department of Applied Mathematics & Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI 02912, USA; Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA.
| | - Ashley E Webb
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI 02912, USA; Center on the Biology of Aging, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA; Center for Translational Neuroscience, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
4
|
Gilis J, Perin L, Malfait M, Van den Berge K, Takele Assefa A, Verbist B, Risso D, Clement L. Differential detection workflows for multi-sample single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.17.572043. [PMID: 38187695 PMCID: PMC10769270 DOI: 10.1101/2023.12.17.572043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
In single-cell transcriptomics, differential gene expression (DE) analyses typically focus on testing differences in the average expression of genes between cell types or conditions of interest. Single-cell transcriptomics, however, also has the promise to prioritise genes for which the expression differ in other aspects of the distribution. Here we develop a workflow for assessing differential detection (DD), which tests for differences in the average fraction of samples or cells in which a gene is detected. After benchmarking eight different DD data analysis strategies, we provide a unified workflow for jointly assessing DE and DD. Using simulations and two case studies, we show that DE and DD analysis provide complementary information, both in terms of the individual genes they report and in the functional interpretation of those genes.
Collapse
Affiliation(s)
- Jeroen Gilis
- These authors contributed equally
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
- Data Mining and Modeling for Biomedicine, VIB Flemish Institute for Biotechnology, Ghent, 9000, Belgium
| | - Laura Perin
- These authors contributed equally
- Department of Statistical Sciences, University of Padova, Padova, Italy
| | - Milan Malfait
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
| | - Koen Van den Berge
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Alemu Takele Assefa
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Bie Verbist
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Davide Risso
- Department of Statistical Sciences, University of Padova, Padova, Italy
- Padua Center for Network Medicine, University of Padova, Padova, Italy
| | - Lieven Clement
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| |
Collapse
|
5
|
Bouland GA, Mahfouz A, Reinders MJT. Consequences and opportunities arising due to sparser single-cell RNA-seq datasets. Genome Biol 2023; 24:86. [PMID: 37085823 PMCID: PMC10120229 DOI: 10.1186/s13059-023-02933-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 04/10/2023] [Indexed: 04/23/2023] Open
Abstract
With the number of cells measured in single-cell RNA sequencing (scRNA-seq) datasets increasing exponentially and concurrent increased sparsity due to more zero counts being measured for many genes, we demonstrate here that downstream analyses on binary-based gene expression give similar results as count-based analyses. Moreover, a binary representation scales up to ~ 50-fold more cells that can be analyzed using the same computational resources. We also highlight the possibilities provided by binarized scRNA-seq data. Development of specialized tools for bit-aware implementations of downstream analytical tasks will enable a more fine-grained resolution of biological heterogeneity.
Collapse
Affiliation(s)
- Gerard A Bouland
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
- Department of Human Genetics, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands
| | - Ahmed Mahfouz
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
- Department of Human Genetics, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands.
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands.
| | - Marcel J T Reinders
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
- Department of Human Genetics, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands.
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands.
| |
Collapse
|
6
|
Doyle JJ. Cell types as species: Exploring a metaphor. FRONTIERS IN PLANT SCIENCE 2022; 13:868565. [PMID: 36072310 PMCID: PMC9444152 DOI: 10.3389/fpls.2022.868565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 07/29/2022] [Indexed: 06/05/2023]
Abstract
The concept of "cell type," though fundamental to cell biology, is controversial. Cells have historically been classified into types based on morphology, physiology, or location. More recently, single cell transcriptomic studies have revealed fine-scale differences among cells with similar gross phenotypes. Transcriptomic snapshots of cells at various stages of differentiation, and of cells under different physiological conditions, have shown that in many cases variation is more continuous than discrete, raising questions about the relationship between cell type and cell state. Some researchers have rejected the notion of fixed types altogether. Throughout the history of discussions on cell type, cell biologists have compared the problem of defining cell type with the interminable and often contentious debate over the definition of arguably the most important concept in systematics and evolutionary biology, "species." In the last decades, systematics, like cell biology, has been transformed by the increasing availability of molecular data, and the fine-grained resolution of genetic relationships have generated new ideas about how that variation should be classified. There are numerous parallels between the two fields that make exploration of the "cell types as species" metaphor timely. These parallels begin with philosophy, with discussion of both cell types and species as being either individuals, groups, or something in between (e.g., homeostatic property clusters). In each field there are various different types of lineages that form trees or networks that can (and in some cases do) provide criteria for grouping. Developing and refining models for evolutionary divergence of species and for cell type differentiation are parallel goals of the two fields. The goal of this essay is to highlight such parallels with the hope of inspiring biologists in both fields to look for new solutions to similar problems outside of their own field.
Collapse
Affiliation(s)
- Jeff J. Doyle
- Section of Plant Biology and Section of Plant Breeding and Genetics, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
7
|
Gagnon J, Pi L, Ryals M, Wan Q, Hu W, Ouyang Z, Zhang B, Li K. Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking. LIFE (BASEL, SWITZERLAND) 2022; 12:life12060850. [PMID: 35743881 PMCID: PMC9225332 DOI: 10.3390/life12060850] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 05/31/2022] [Accepted: 06/04/2022] [Indexed: 12/13/2022]
Abstract
To guide analysts to select the right tool and parameters in differential gene expression analyses of single-cell RNA sequencing (scRNA-seq) data, we developed a novel simulator that recapitulates the data characteristics of real scRNA-seq datasets while accounting for all the relevant sources of variation in a multi-subject, multi-condition scRNA-seq experiment: the cell-to-cell variation within a subject, the variation across subjects, the variability across cell types, the mean/variance relationship of gene expression across genes, library size effects, group effects, and covariate effects. By applying it to benchmark 12 differential gene expression analysis methods (including cell-level and pseudo-bulk methods) on simulated multi-condition, multi-subject data of the 10x Genomics platform, we demonstrated that methods originating from the negative binomial mixed model such as glmmTMB and NEBULA-HL outperformed other methods. Utilizing NEBULA-HL in a statistical analysis pipeline for single-cell analysis will enable scientists to better understand the cell-type-specific transcriptomic response to disease or treatment effects and to discover new drug targets. Further, application to two real datasets showed the outperformance of our differential expression (DE) pipeline, with unified findings of differentially expressed genes (DEG) and a pseudo-time trajectory transcriptomic result. In the end, we made recommendations for filtering strategies of cells and genes based on simulation results to achieve optimal experimental goals.
Collapse
Affiliation(s)
- Jake Gagnon
- Analytics and Data Sciences, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
| | - Lira Pi
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Matthew Ryals
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Qingwen Wan
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Wenxing Hu
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
| | - Zhengyu Ouyang
- BioInfoRx, Inc., 510 Charmany Dr., Suite 275A, Madison, WI 53719, USA;
| | - Baohong Zhang
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
- Correspondence: (B.Z.); (K.L.)
| | - Kejie Li
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
- Correspondence: (B.Z.); (K.L.)
| |
Collapse
|