1
|
Yu D, Li M, Linghu G, Hu Y, Hajdarovic KH, Wang A, Singh R, Webb AE. CellBiAge: Improved single-cell age classification using data binarization. Cell Rep 2023; 42:113500. [PMID: 38032797 PMCID: PMC10791072 DOI: 10.1016/j.celrep.2023.113500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 10/20/2023] [Accepted: 11/13/2023] [Indexed: 12/02/2023] Open
Abstract
Aging is a major risk factor for many diseases. Accurate methods for predicting age in specific cell types are essential to understand the heterogeneity of aging and to assess rejuvenation strategies. However, classifying organismal age at single-cell resolution using transcriptomics is challenging due to sparsity and noise. Here, we developed CellBiAge, a robust and easy-to-implement machine learning pipeline, to classify the age of single cells in the mouse brain using single-cell transcriptomics. We show that binarization of gene expression values for the top highly variable genes significantly improved test performance across different models, techniques, sexes, and brain regions, with potential age-related genes identified for model prediction. Additionally, we demonstrate CellBiAge's ability to capture exercise-induced rejuvenation in neural stem cells. This study provides a broadly applicable approach for robust classification of organismal age of single cells in the mouse brain, which may aid in understanding the aging process and evaluating rejuvenation methods.
Collapse
Affiliation(s)
- Doudou Yu
- Molecular Biology, Cell Biology, and Biochemistry Graduate Program, Brown University, Providence, RI 02912, USA; Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Manlin Li
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Guanjie Linghu
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | - Yihuan Hu
- Data Science Institute, Brown University, Providence, RI 02912, USA
| | | | - An Wang
- Department of Applied Mathematics & Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI 02912, USA; Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA.
| | - Ashley E Webb
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI 02912, USA; Center on the Biology of Aging, Brown University, Providence, RI 02912, USA; Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA; Center for Translational Neuroscience, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
2
|
Gilis J, Perin L, Malfait M, Van den Berge K, Takele Assefa A, Verbist B, Risso D, Clement L. Differential detection workflows for multi-sample single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.17.572043. [PMID: 38187695 PMCID: PMC10769270 DOI: 10.1101/2023.12.17.572043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
In single-cell transcriptomics, differential gene expression (DE) analyses typically focus on testing differences in the average expression of genes between cell types or conditions of interest. Single-cell transcriptomics, however, also has the promise to prioritise genes for which the expression differ in other aspects of the distribution. Here we develop a workflow for assessing differential detection (DD), which tests for differences in the average fraction of samples or cells in which a gene is detected. After benchmarking eight different DD data analysis strategies, we provide a unified workflow for jointly assessing DE and DD. Using simulations and two case studies, we show that DE and DD analysis provide complementary information, both in terms of the individual genes they report and in the functional interpretation of those genes.
Collapse
Affiliation(s)
- Jeroen Gilis
- These authors contributed equally
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
- Data Mining and Modeling for Biomedicine, VIB Flemish Institute for Biotechnology, Ghent, 9000, Belgium
| | - Laura Perin
- These authors contributed equally
- Department of Statistical Sciences, University of Padova, Padova, Italy
| | - Milan Malfait
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
| | - Koen Van den Berge
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Alemu Takele Assefa
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Bie Verbist
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Davide Risso
- Department of Statistical Sciences, University of Padova, Padova, Italy
- Padua Center for Network Medicine, University of Padova, Padova, Italy
| | - Lieven Clement
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| |
Collapse
|
3
|
Bouland GA, Mahfouz A, Reinders MJT. Consequences and opportunities arising due to sparser single-cell RNA-seq datasets. Genome Biol 2023; 24:86. [PMID: 37085823 PMCID: PMC10120229 DOI: 10.1186/s13059-023-02933-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 04/10/2023] [Indexed: 04/23/2023] Open
Abstract
With the number of cells measured in single-cell RNA sequencing (scRNA-seq) datasets increasing exponentially and concurrent increased sparsity due to more zero counts being measured for many genes, we demonstrate here that downstream analyses on binary-based gene expression give similar results as count-based analyses. Moreover, a binary representation scales up to ~ 50-fold more cells that can be analyzed using the same computational resources. We also highlight the possibilities provided by binarized scRNA-seq data. Development of specialized tools for bit-aware implementations of downstream analytical tasks will enable a more fine-grained resolution of biological heterogeneity.
Collapse
Affiliation(s)
- Gerard A Bouland
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
- Department of Human Genetics, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands
| | - Ahmed Mahfouz
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
- Department of Human Genetics, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands.
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands.
| | - Marcel J T Reinders
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
- Department of Human Genetics, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands.
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, 2333ZC, The Netherlands.
| |
Collapse
|
4
|
Doyle JJ. Cell types as species: Exploring a metaphor. FRONTIERS IN PLANT SCIENCE 2022; 13:868565. [PMID: 36072310 PMCID: PMC9444152 DOI: 10.3389/fpls.2022.868565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 07/29/2022] [Indexed: 06/05/2023]
Abstract
The concept of "cell type," though fundamental to cell biology, is controversial. Cells have historically been classified into types based on morphology, physiology, or location. More recently, single cell transcriptomic studies have revealed fine-scale differences among cells with similar gross phenotypes. Transcriptomic snapshots of cells at various stages of differentiation, and of cells under different physiological conditions, have shown that in many cases variation is more continuous than discrete, raising questions about the relationship between cell type and cell state. Some researchers have rejected the notion of fixed types altogether. Throughout the history of discussions on cell type, cell biologists have compared the problem of defining cell type with the interminable and often contentious debate over the definition of arguably the most important concept in systematics and evolutionary biology, "species." In the last decades, systematics, like cell biology, has been transformed by the increasing availability of molecular data, and the fine-grained resolution of genetic relationships have generated new ideas about how that variation should be classified. There are numerous parallels between the two fields that make exploration of the "cell types as species" metaphor timely. These parallels begin with philosophy, with discussion of both cell types and species as being either individuals, groups, or something in between (e.g., homeostatic property clusters). In each field there are various different types of lineages that form trees or networks that can (and in some cases do) provide criteria for grouping. Developing and refining models for evolutionary divergence of species and for cell type differentiation are parallel goals of the two fields. The goal of this essay is to highlight such parallels with the hope of inspiring biologists in both fields to look for new solutions to similar problems outside of their own field.
Collapse
|
5
|
Gagnon J, Pi L, Ryals M, Wan Q, Hu W, Ouyang Z, Zhang B, Li K. Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking. LIFE (BASEL, SWITZERLAND) 2022; 12:life12060850. [PMID: 35743881 PMCID: PMC9225332 DOI: 10.3390/life12060850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 05/31/2022] [Accepted: 06/04/2022] [Indexed: 12/13/2022]
Abstract
To guide analysts to select the right tool and parameters in differential gene expression analyses of single-cell RNA sequencing (scRNA-seq) data, we developed a novel simulator that recapitulates the data characteristics of real scRNA-seq datasets while accounting for all the relevant sources of variation in a multi-subject, multi-condition scRNA-seq experiment: the cell-to-cell variation within a subject, the variation across subjects, the variability across cell types, the mean/variance relationship of gene expression across genes, library size effects, group effects, and covariate effects. By applying it to benchmark 12 differential gene expression analysis methods (including cell-level and pseudo-bulk methods) on simulated multi-condition, multi-subject data of the 10x Genomics platform, we demonstrated that methods originating from the negative binomial mixed model such as glmmTMB and NEBULA-HL outperformed other methods. Utilizing NEBULA-HL in a statistical analysis pipeline for single-cell analysis will enable scientists to better understand the cell-type-specific transcriptomic response to disease or treatment effects and to discover new drug targets. Further, application to two real datasets showed the outperformance of our differential expression (DE) pipeline, with unified findings of differentially expressed genes (DEG) and a pseudo-time trajectory transcriptomic result. In the end, we made recommendations for filtering strategies of cells and genes based on simulation results to achieve optimal experimental goals.
Collapse
Affiliation(s)
- Jake Gagnon
- Analytics and Data Sciences, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
| | - Lira Pi
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Matthew Ryals
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Qingwen Wan
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Wenxing Hu
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
| | - Zhengyu Ouyang
- BioInfoRx, Inc., 510 Charmany Dr., Suite 275A, Madison, WI 53719, USA;
| | - Baohong Zhang
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
- Correspondence: (B.Z.); (K.L.)
| | - Kejie Li
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
- Correspondence: (B.Z.); (K.L.)
| |
Collapse
|