1
|
Wang H, Torous W, Gong B, Purdom E. Visualizing scRNA-Seq Data at Population Scale with GloScope. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.29.542786. [PMID: 37398321 PMCID: PMC10312527 DOI: 10.1101/2023.05.29.542786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Increasingly, scRNA-Seq studies explore cell populations across different samples and the effect of sample heterogeneity on organism's phenotype. However, relatively few bioinformatic methods have been developed which adequately address the variation between samples for such population-level analyses. We propose a framework for representing the entire single-cell profile of a sample, which we call a GloScope representation. We implement GloScope on scRNA-Seq datasets from study designs ranging from 12 to over 300 samples and demonstrate how GloScope allows researchers to perform essential bioinformatic tasks at the sample-level, in particular visualization and quality control assessment.
Collapse
Affiliation(s)
- Hao Wang
- Division of Biostatistics, University of California, Berkeley, CA, USA
| | - William Torous
- Department of Statistics, University of California, Berkeley, CA, USA
| | - Boying Gong
- Division of Biostatistics, University of California, Berkeley, CA, USA
| | - Elizabeth Purdom
- Department of Statistics, University of California, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, CA, USA
| |
Collapse
|
2
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
3
|
Han G, Yan D, Sun Z, Fang J, Chang X, Wilson L, Liu Y. Bayesian-frequentist hybrid inference framework for single cell RNA-seq analyses. Hum Genomics 2024; 18:69. [PMID: 38902839 DOI: 10.1186/s40246-024-00638-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 06/12/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Single cell RNA sequencing technology (scRNA-seq) has been proven useful in understanding cell-specific disease mechanisms. However, identifying genes of interest remains a key challenge. Pseudo-bulk methods that pool scRNA-seq counts in the same biological replicates have been commonly used to identify differentially expressed genes. However, such methods may lack power due to the limited sample size of scRNA-seq datasets, which can be prohibitively expensive. RESULTS Motivated by this, we proposed to use the Bayesian-frequentist hybrid (BFH) framework to increase the power and we showed in simulated scenario, the proposed BFH would be an optimal method when compared with other popular single cell differential expression methods if both FDR and power were considered. As an example, the method was applied to an idiopathic pulmonary fibrosis (IPF) case study. CONCLUSION In our IPF example, we demonstrated that with a proper informative prior, the BFH approach identified more genes of interest. Furthermore, these genes were reasonable based on the current knowledge of IPF. Thus, the BFH offers a unique and flexible framework for future scRNA-seq analyses.
Collapse
Affiliation(s)
- Gang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Texas A&M University, College Station, TX, USA
| | - Dongyan Yan
- Eli Lilly and Company, Lilly Corporate Center, 893 Delaware St, Indianapolis, IN, 46225, USA
| | - Zhe Sun
- Eli Lilly and Company, Lilly Corporate Center, 893 Delaware St, Indianapolis, IN, 46225, USA
| | - Jiyuan Fang
- Eli Lilly and Company, Lilly Corporate Center, 893 Delaware St, Indianapolis, IN, 46225, USA
| | - Xinyue Chang
- Eli Lilly and Company, Lilly Corporate Center, 893 Delaware St, Indianapolis, IN, 46225, USA
| | - Lucas Wilson
- Department of Epidemiology and Biostatistics, School of Public Health, Texas A&M University, College Station, TX, USA
| | - Yushi Liu
- Eli Lilly and Company, Lilly Corporate Center, 893 Delaware St, Indianapolis, IN, 46225, USA.
| |
Collapse
|
4
|
Guo X, Ning J, Chen Y, Liu G, Zhao L, Fan Y, Sun S. Recent advances in differential expression analysis for single-cell RNA-seq and spatially resolved transcriptomic studies. Brief Funct Genomics 2024; 23:95-109. [PMID: 37022699 DOI: 10.1093/bfgp/elad011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/09/2022] [Accepted: 03/10/2023] [Indexed: 04/07/2023] Open
Abstract
Differential expression (DE) analysis is a necessary step in the analysis of single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data. Unlike traditional bulk RNA-seq, DE analysis for scRNA-seq or SRT data has unique characteristics that may contribute to the difficulty of detecting DE genes. However, the plethora of DE tools that work with various assumptions makes it difficult to choose an appropriate one. Furthermore, a comprehensive review on detecting DE genes for scRNA-seq data or SRT data from multi-condition, multi-sample experimental designs is lacking. To bridge such a gap, here, we first focus on the challenges of DE detection, then highlight potential opportunities that facilitate further progress in scRNA-seq or SRT analysis, and finally provide insights and guidance in selecting appropriate DE tools or developing new computational DE methods.
Collapse
Affiliation(s)
- Xiya Guo
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Jin Ning
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yuanze Chen
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Guoliang Liu
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Liyan Zhao
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yue Fan
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Shiquan Sun
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| |
Collapse
|
5
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BMC Bioinformatics 2024; 25:113. [PMID: 38486150 PMCID: PMC10941434 DOI: 10.1186/s12859-024-05724-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Yixuan Qiu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Liang P, Li H, Long C, Liu M, Zhou J, Zuo Y. Chromatin region binning of gene expression for improving embryo cell subtype identification. Comput Biol Med 2024; 170:108049. [PMID: 38290319 DOI: 10.1016/j.compbiomed.2024.108049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/01/2024] [Accepted: 01/26/2024] [Indexed: 02/01/2024]
Abstract
Mammalian embryonic development is a complex process, characterized by intricate spatiotemporal dynamics and distinct chromatin preferences. However, the quick diversification in early embryogenesis leads to significant cellular diversity and the sparsity of scRNA-seq data, posing challenges in accurately determining cell fate decisions. In this study, we introduce a chromatin region binning method using scChrBin, designed to identify chromatin regions that elucidate the dynamics of embryonic development and lineage differentiation. This method transforms scRNA-seq data into a chromatin-based matrix, leveraging genomic annotations. Our results showed that the scChrBin method achieves high accuracy, with 98.0% and 89.2% on two single-cell embryonic datasets, demonstrating its effectiveness in analyzing complex developmental processes. We also systematically and comprehensively analysis of these key chromatin binning regions and their associated genes, focusing on their roles in lineage and stage development. The perspective of chromatin region binning method enables a comprehensive analysis of transcriptome data at the chromatin level, allowing us to unveil the dynamic expression of chromatin regions across temporal and spatial development. The tool is available as an application at https://github.com/liameihao/scChrBin.
Collapse
Affiliation(s)
- Pengfei Liang
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Hanshuang Li
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Chunshen Long
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Mingzhu Liu
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Jian Zhou
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|
7
|
Mallick H, Porwal A, Saha S, Basak P, Svetnik V, Paul E. An integrated Bayesian framework for multi-omics prediction and classification. Stat Med 2024; 43:983-1002. [PMID: 38146838 DOI: 10.1002/sim.9953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 10/06/2023] [Accepted: 10/24/2023] [Indexed: 12/27/2023]
Abstract
With the growing commonality of multi-omics datasets, there is now increasing evidence that integrated omics profiles lead to more efficient discovery of clinically actionable biomarkers that enable better disease outcome prediction and patient stratification. Several methods exist to perform host phenotype prediction from cross-sectional, single-omics data modalities but decentralized frameworks that jointly analyze multiple time-dependent omics data to highlight the integrative and dynamic impact of repeatedly measured biomarkers are currently limited. In this article, we propose a novel Bayesian ensemble method to consolidate prediction by combining information across several longitudinal and cross-sectional omics data layers. Unlike existing frequentist paradigms, our approach enables uncertainty quantification in prediction as well as interval estimation for a variety of quantities of interest based on posterior summaries. We apply our method to four published multi-omics datasets and demonstrate that it recapitulates known biology in addition to providing novel insights while also outperforming existing methods in estimation, prediction, and uncertainty quantification. Our open-source software is publicly available at https://github.com/himelmallick/IntegratedLearner.
Collapse
Affiliation(s)
- Himel Mallick
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, 10065, New York, USA
- Department of Statistics and Data Science, Cornell University, Ithaca, New York, USA
| | - Anupreet Porwal
- Department of Statistics, University of Washington, Seattle, Washington, USA
| | - Satabdi Saha
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Piyali Basak
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Vladimir Svetnik
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| | - Erina Paul
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, USA
| |
Collapse
|
8
|
Gilis J, Perin L, Malfait M, Van den Berge K, Takele Assefa A, Verbist B, Risso D, Clement L. Differential detection workflows for multi-sample single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.17.572043. [PMID: 38187695 PMCID: PMC10769270 DOI: 10.1101/2023.12.17.572043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
In single-cell transcriptomics, differential gene expression (DE) analyses typically focus on testing differences in the average expression of genes between cell types or conditions of interest. Single-cell transcriptomics, however, also has the promise to prioritise genes for which the expression differ in other aspects of the distribution. Here we develop a workflow for assessing differential detection (DD), which tests for differences in the average fraction of samples or cells in which a gene is detected. After benchmarking eight different DD data analysis strategies, we provide a unified workflow for jointly assessing DE and DD. Using simulations and two case studies, we show that DE and DD analysis provide complementary information, both in terms of the individual genes they report and in the functional interpretation of those genes.
Collapse
Affiliation(s)
- Jeroen Gilis
- These authors contributed equally
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
- Data Mining and Modeling for Biomedicine, VIB Flemish Institute for Biotechnology, Ghent, 9000, Belgium
| | - Laura Perin
- These authors contributed equally
- Department of Statistical Sciences, University of Padova, Padova, Italy
| | - Milan Malfait
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
| | - Koen Van den Berge
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Alemu Takele Assefa
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Bie Verbist
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Davide Risso
- Department of Statistical Sciences, University of Padova, Padova, Italy
- Padua Center for Network Medicine, University of Padova, Padova, Italy
| | - Lieven Clement
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| |
Collapse
|
9
|
Liang X, Cao L, Chen H, Wang L, Wang Y, Fu L, Tan X, Chen E, Ding Y, Tang J. A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study. Brief Bioinform 2023; 25:bbad497. [PMID: 38168839 PMCID: PMC10782910 DOI: 10.1093/bib/bbad497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/13/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lijie Cao
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Hao Chen
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lidan Wang
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Yangyun Wang
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lijuan Fu
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
- Department of Pharmacology, Academician Workstation, Changsha Medical University, Changsha 410219, China
| | - Xiaqin Tan
- The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Enxiang Chen
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Yubin Ding
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Jing Tang
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| |
Collapse
|
10
|
Lee L, Yu M, Li X, Zhu C, Zhang Y, Yu H, Chen Z, Mishra S, Ren B, Li Y, Hu M. SnapHiC-D: a computational pipeline to identify differential chromatin contacts from single-cell Hi-C data. Brief Bioinform 2023; 24:bbad315. [PMID: 37649383 PMCID: PMC10516352 DOI: 10.1093/bib/bbad315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/04/2023] [Accepted: 08/07/2023] [Indexed: 09/01/2023] Open
Abstract
Single-cell high-throughput chromatin conformation capture technologies (scHi-C) has been used to map chromatin spatial organization in complex tissues. However, computational tools to detect differential chromatin contacts (DCCs) from scHi-C datasets in development and through disease pathogenesis are still lacking. Here, we present SnapHiC-D, a computational pipeline to identify DCCs between two scHi-C datasets. Compared to methods designed for bulk Hi-C data, SnapHiC-D detects DCCs with high sensitivity and accuracy. We used SnapHiC-D to identify cell-type-specific chromatin contacts at 10 Kb resolution in mouse hippocampal and human prefrontal cortical tissues, demonstrating that DCCs detected in the hippocampal and cortical cell types are generally associated with cell-type-specific gene expression patterns and epigenomic features. SnapHiC-D is freely available at https://github.com/HuMingLab/SnapHiC-D.
Collapse
Affiliation(s)
- Lindsay Lee
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Miao Yu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Xiaoqi Li
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, NC, USA
| | - Chenxu Zhu
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- New York Genome Center, New York, NY, USA
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Yanxiao Zhang
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Westlake University, Hangzhou, Zhejiang, China
| | - Hongyu Yu
- Department of Statistics, University of Wisconsin Madison, Madison, WI, USA
- Department of Biochemistry, University of Wisconsin Madison, Madison, WI, USA
| | - Ziyin Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Shreya Mishra
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, La Jolla, CA, USA
- Center for Epigenomics & Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| |
Collapse
|
11
|
Sun Y, Shim WJ, Shen S, Sinniah E, Pham D, Su Z, Mizikovsky D, White MD, Ho JK, Nguyen Q, Bodén M, Palpant N. Inferring cell diversity in single cell data using consortium-scale epigenetic data as a biological anchor for cell identity. Nucleic Acids Res 2023; 51:e62. [PMID: 37125641 PMCID: PMC10287941 DOI: 10.1093/nar/gkad307] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 04/28/2023] [Indexed: 05/02/2023] Open
Abstract
Methods for cell clustering and gene expression from single-cell RNA sequencing (scRNA-seq) data are essential for biological interpretation of cell processes. Here, we present TRIAGE-Cluster which uses genome-wide epigenetic data from diverse bio-samples to identify genes demarcating cell diversity in scRNA-seq data. By integrating patterns of repressive chromatin deposited across diverse cell types with weighted density estimation, TRIAGE-Cluster determines cell type clusters in a 2D UMAP space. We then present TRIAGE-ParseR, a machine learning method which evaluates gene expression rank lists to define gene groups governing the identity and function of cell types. We demonstrate the utility of this two-step approach using atlases of in vivo and in vitro cell diversification and organogenesis. We also provide a web accessible dashboard for analysis and download of data and software. Collectively, genome-wide epigenetic repression provides a versatile strategy to define cell diversity and study gene regulation of scRNA-seq data.
Collapse
Affiliation(s)
- Yuliangzi Sun
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Woo Jun Shim
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Sophie Shen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Enakshi Sinniah
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Duy Pham
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Zezhuo Su
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong SAR, China
| | - Dalia Mizikovsky
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Melanie D White
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Joshua W K Ho
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong SAR, China
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Nathan J Palpant
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
12
|
Gutiérrez-Franco A, Ake F, Hassan MN, Cayuela NC, Mularoni L, Plass M. Methanol fixation is the method of choice for droplet-based single-cell transcriptomics of neural cells. Commun Biol 2023; 6:522. [PMID: 37188816 DOI: 10.1038/s42003-023-04834-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 04/12/2023] [Indexed: 05/17/2023] Open
Abstract
The main critical step in single-cell transcriptomics is sample preparation. Several methods have been developed to preserve cells after dissociation to uncouple sample handling from library preparation. Yet, the suitability of these methods depends on the cell types to be processed. In this project, we perform a systematic comparison of preservation methods for droplet-based single-cell RNA-seq on neural and glial cells derived from induced pluripotent stem cells. Our results show that while DMSO provides the highest cell quality in terms of RNA molecules and genes detected per cell, it strongly affects the cellular composition and induces the expression of stress and apoptosis genes. In contrast, methanol fixed samples display a cellular composition similar to fresh samples and provide a good cell quality and little expression biases. Taken together, our results show that methanol fixation is the method of choice for performing droplet-based single-cell transcriptomics experiments on neural cell populations.
Collapse
Affiliation(s)
- Ana Gutiérrez-Franco
- Gene Regulation of Cell Identity, Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Spain
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P-CMR[C], L'Hospitalet del Llobregat, Barcelona, Spain
| | - Franz Ake
- Gene Regulation of Cell Identity, Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Spain
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P-CMR[C], L'Hospitalet del Llobregat, Barcelona, Spain
| | - Mohamed N Hassan
- Gene Regulation of Cell Identity, Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Spain
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P-CMR[C], L'Hospitalet del Llobregat, Barcelona, Spain
| | - Natalie Chaves Cayuela
- Gene Regulation of Cell Identity, Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Spain
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P-CMR[C], L'Hospitalet del Llobregat, Barcelona, Spain
| | - Loris Mularoni
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P-CMR[C], L'Hospitalet del Llobregat, Barcelona, Spain
- Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Spain
| | - Mireya Plass
- Gene Regulation of Cell Identity, Regenerative Medicine Program, Bellvitge Institute for Biomedical Research (IDIBELL), L'Hospitalet del Llobregat, Barcelona, Spain.
- Program for Advancing Clinical Translation of Regenerative Medicine of Catalonia, P-CMR[C], L'Hospitalet del Llobregat, Barcelona, Spain.
- Center for Networked Biomedical Research on Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain.
| |
Collapse
|
13
|
Dong X, Bacher R. Analysis of Single-Cell RNA-seq Data. Methods Mol Biol 2023; 2629:95-114. [PMID: 36929075 DOI: 10.1007/978-1-0716-2986-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
As single-cell RNA sequencing experiments continue to advance scientific discoveries across biological disciplines, an increasing number of analysis tools and workflows for analyzing the data have been developed. In this chapter, we describe a standard workflow and elaborate on relevant data analysis tools for analyzing single-cell RNA sequencing data. We provide recommendations for the appropriate use of commonly used methods, with code examples and analysis interpretations.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA
| | - Rhonda Bacher
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA.
| |
Collapse
|
14
|
Ke Y, Jian-yuan H, Ping Z, Yue W, Na X, Jian Y, Kai-xuan L, Yi-fan S, Han-bin L, Rong L. The progressive application of single-cell RNA sequencing technology in cardiovascular diseases. Biomed Pharmacother 2022; 154:113604. [DOI: 10.1016/j.biopha.2022.113604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/20/2022] [Accepted: 08/23/2022] [Indexed: 11/02/2022] Open
|
15
|
Miao Z, Kong W, Vinayak RK, Sun W, Han F. Fisher-Pitman permutation tests based on nonparametric Poisson mixtures with application to single cell genomics. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2120401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Zhen Miao
- Department of Statistics, University of Washington, Seattle
| | | | | | - Wei Sun
- Public Health Science Division, Fred Hutchinson Cancer Research Center
| | - Fang Han
- Department of Statistics, University of Washington, Seattle
| |
Collapse
|
16
|
Zhong W, Liu W, Chen J, Sun Q, Hu M, Li Y. Understanding the function of regulatory DNA interactions in the interpretation of non-coding GWAS variants. Front Cell Dev Biol 2022; 10:957292. [PMID: 36060805 PMCID: PMC9437546 DOI: 10.3389/fcell.2022.957292] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/21/2022] [Indexed: 01/11/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified a vast number of variants associated with various complex human diseases and traits. However, most of these GWAS variants reside in non-coding regions producing no proteins, making the interpretation of these variants a daunting challenge. Prior evidence indicates that a subset of non-coding variants detected within or near cis-regulatory elements (e.g., promoters, enhancers, silencers, and insulators) might play a key role in disease etiology by regulating gene expression. Advanced sequencing- and imaging-based technologies, together with powerful computational methods, enabling comprehensive characterization of regulatory DNA interactions, have substantially improved our understanding of the three-dimensional (3D) genome architecture. Recent literature witnesses plenty of examples where using chromosome conformation capture (3C)-based technologies successfully links non-coding variants to their target genes and prioritizes relevant tissues or cell types. These examples illustrate the critical capability of 3D genome organization in annotating non-coding GWAS variants. This review discusses how 3D genome organization information contributes to elucidating the potential roles of non-coding GWAS variants in disease etiology.
Collapse
Affiliation(s)
- Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co, Inc, Rahway, NJ, United States
| | - Weifang Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
17
|
Mallick H, Chatterjee S, Chowdhury S, Chatterjee S, Rahnavard A, Hicks SC. Differential expression of single-cell RNA-seq data using Tweedie models. Stat Med 2022; 41:3492-3510. [PMID: 35656596 PMCID: PMC9288986 DOI: 10.1002/sim.9430] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 12/13/2022]
Abstract
The performance of computational methods and software to identify differentially expressed features in single-cell RNA-sequencing (scRNA-seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA-seq expression features. To model the technological variability in cross-platform scRNA-seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA-seq expression profiles across experimental platforms induced by platform- and gene-specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero-inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero-inflated scRNA-seq data with excessive zero counts. Using both synthetic and published plate- and droplet-based scRNA-seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state-of-the-art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open-source software (R/Bioconductor package) is available at https://github.com/himelmallick/Tweedieverse.
Collapse
Affiliation(s)
- Himel Mallick
- Biostatistics and Research Decision Sciences, Merck &
Co., Inc., Rahway, NJ 07065, USA
| | - Suvo Chatterjee
- Epidemiology Branch, Division of Intramural Population
Health Research, Eunice Kennedy Shriver National Institute of Child
Health and Human Development, National Institutes of Health, Bethesda, MD 20892,
USA
| | - Shrabanti Chowdhury
- Department of Genetics and Genomic Sciences and Icahn
Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount
Sinai, New York, NY 10029, USA
| | - Saptarshi Chatterjee
- Department of Statistics, Data and Analytics, Eli Lilly
& Company, Indianapolis, IN 46225, USA
| | - Ali Rahnavard
- Computational Biology Institute, Department of
Biostatistics and Bioinformatics, Milken Institute School of Public Health, The
George Washington University, Washington, DC 20052, USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School
of Public Health, Baltimore, MD 21205, USA
| |
Collapse
|
18
|
Das S, Rai A, Rai SN. Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges. ENTROPY 2022; 24:e24070995. [PMID: 35885218 PMCID: PMC9315519 DOI: 10.3390/e24070995] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/25/2022] [Accepted: 07/09/2022] [Indexed: 01/11/2023]
Abstract
With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single experiment. Differential expression analysis is the primary downstream analysis of such data to identify gene markers for cell type detection and also provide inputs to other secondary analyses. Many statistical approaches for differential expression analysis have been reported in the literature. Therefore, we critically discuss the underlying statistical principles of the approaches and distinctly divide them into six major classes, i.e., generalized linear, generalized additive, Hurdle, mixture models, two-class parametric, and non-parametric approaches. We also succinctly discuss the limitations that are specific to each class of approaches, and how they are addressed by other subsequent classes of approach. A number of challenges are identified in this study that must be addressed to develop the next class of innovative approaches. Furthermore, we also emphasize the methodological challenges involved in differential expression analysis of scRNA-seq data that researchers must address to draw maximum benefit from this recent single-cell technology. This study will serve as a guide to genome researchers and experimental biologists to objectively select options for their analysis.
Collapse
Affiliation(s)
- Samarendra Das
- ICAR-Directorate of Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India
- International Centre for Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India
- Correspondence: or (S.D.); (S.N.R.)
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
| | - Shesh N. Rai
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
- Biostatistics and Bioinformatics Facility, Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
- Biostatisitcs and Informatics Facility, Center for Integrative Environmental Health Sciences, University of Louisville, Louisville, KY 40202, USA
- Data Analysis and Sample Management Facility, The University of Louisville Super Fund Center, University of Louisville, Louisville, KY 40202, USA
- Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA
- Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY 40202, USA
- Correspondence: or (S.D.); (S.N.R.)
| |
Collapse
|
19
|
Gagnon J, Pi L, Ryals M, Wan Q, Hu W, Ouyang Z, Zhang B, Li K. Recommendations of scRNA-seq Differential Gene Expression Analysis Based on Comprehensive Benchmarking. LIFE (BASEL, SWITZERLAND) 2022; 12:life12060850. [PMID: 35743881 PMCID: PMC9225332 DOI: 10.3390/life12060850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 05/31/2022] [Accepted: 06/04/2022] [Indexed: 12/13/2022]
Abstract
To guide analysts to select the right tool and parameters in differential gene expression analyses of single-cell RNA sequencing (scRNA-seq) data, we developed a novel simulator that recapitulates the data characteristics of real scRNA-seq datasets while accounting for all the relevant sources of variation in a multi-subject, multi-condition scRNA-seq experiment: the cell-to-cell variation within a subject, the variation across subjects, the variability across cell types, the mean/variance relationship of gene expression across genes, library size effects, group effects, and covariate effects. By applying it to benchmark 12 differential gene expression analysis methods (including cell-level and pseudo-bulk methods) on simulated multi-condition, multi-subject data of the 10x Genomics platform, we demonstrated that methods originating from the negative binomial mixed model such as glmmTMB and NEBULA-HL outperformed other methods. Utilizing NEBULA-HL in a statistical analysis pipeline for single-cell analysis will enable scientists to better understand the cell-type-specific transcriptomic response to disease or treatment effects and to discover new drug targets. Further, application to two real datasets showed the outperformance of our differential expression (DE) pipeline, with unified findings of differentially expressed genes (DEG) and a pseudo-time trajectory transcriptomic result. In the end, we made recommendations for filtering strategies of cells and genes based on simulation results to achieve optimal experimental goals.
Collapse
Affiliation(s)
- Jake Gagnon
- Analytics and Data Sciences, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
| | - Lira Pi
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Matthew Ryals
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Qingwen Wan
- PharmaLex, 1700 District Ave., Burlington, MA 01803, USA; (L.P.); (M.R.); (Q.W.)
| | - Wenxing Hu
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
| | - Zhengyu Ouyang
- BioInfoRx, Inc., 510 Charmany Dr., Suite 275A, Madison, WI 53719, USA;
| | - Baohong Zhang
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
- Correspondence: (B.Z.); (K.L.)
| | - Kejie Li
- Research Department, Biogen, Inc., 225 Binney St., Cambridge, MA 02142, USA;
- Correspondence: (B.Z.); (K.L.)
| |
Collapse
|
20
|
Zhang M, Guo FR. BSDE: barycenter single-cell differential expression for case-control studies. Bioinformatics 2022; 38:2765-2772. [PMID: 35561165 PMCID: PMC9113363 DOI: 10.1093/bioinformatics/btac171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 03/14/2022] [Accepted: 03/23/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-cell sequencing brings about a revolutionarily high resolution for finding differentially expressed genes (DEGs) by disentangling highly heterogeneous cell tissues. Yet, such analysis is so far mostly focused on comparing between different cell types from the same individual. As single-cell sequencing becomes cheaper and easier to use, an increasing number of datasets from case-control studies are becoming available, which call for new methods for identifying differential expressions between case and control individuals. RESULTS To bridge this gap, we propose barycenter single-cell differential expression (BSDE), a nonparametric method for finding DEGs for case-control studies. Through the use of optimal transportation for aggregating distributions and computing their distances, our method overcomes the restrictive parametric assumptions imposed by standard mixed-effect-modeling approaches. Through simulations, we show that BSDE can accurately detect a variety of differential expressions while maintaining the type-I error at a prescribed level. Further, 1345 and 1568 cell type-specific DEGs are identified by BSDE from datasets on pulmonary fibrosis and multiple sclerosis, among which the top findings are supported by previous results from the literature. AVAILABILITY AND IMPLEMENTATION R package BSDE is freely available from doi.org/10.5281/zenodo.6332254. For real data analysis with the R package, see doi.org/10.5281/zenodo.6332566. These can also be accessed thorough GitHub at github.com/mqzhanglab/BSDE and github.com/mqzhanglab/BSDE_pipeline. The two single-cell sequencing datasets can be download with UCSC cell browser from cells.ucsc.edu/?ds=ms and cells.ucsc.edu/?ds=lung-pf-control. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mengqi Zhang
- Department of Surgery, Perelman Medical School, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | |
Collapse
|
21
|
Rochman M, Wen T, Kotliar M, Dexheimer PJ, Ben-Baruch Morgenstern N, Caldwell JM, Lim HW, Rothenberg ME. Single-cell RNA sequencing of human esophageal epithelium in homeostasis and allergic inflammation. JCI Insight 2022; 7:159093. [PMID: 35472002 PMCID: PMC9208762 DOI: 10.1172/jci.insight.159093] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 04/22/2022] [Indexed: 11/17/2022] Open
Abstract
Inflammation of the esophageal epithelium is a hallmark of eosinophilic esophagitis (EoE), an emerging chronic allergic disease. Herein, we probed human esophageal epithelial cells at single-cell resolution during homeostasis and EoE. During allergic inflammation, the epithelial differentiation program was blocked, leading to loss of KRT6high differentiated populations and expansion of TOP2high proliferating and DSPhigh, SERPINB3high transitioning populations; however, there was stability of the stem cell-enriched PDPNhigh basal epithelial compartment. This differentiation program blockade was associated with dysregulation of transcription factors, including nuclear receptor signalers, in the most differentiated epithelial cells and altered NOTCH-related cell-to-cell communication. Each epithelial population expressed genes with allergic disease risk variants, supporting their functional interplay. The esophageal epithelium differed notably between EoE in histologic remission and controls, indicating that remission is a transitory state poised to relapse. Collectively, our data uncover the dynamic nature of the inflamed human esophageal epithelium and provide a framework to better understand esophageal health and disease.
Collapse
Affiliation(s)
- Mark Rochman
- Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, United States of America
| | - Ting Wen
- Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, United States of America
| | - Michael Kotliar
- Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, United States of America
| | - Phillip J Dexheimer
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, United States of America
| | - Netali Ben-Baruch Morgenstern
- Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, United States of America
| | - Julie M Caldwell
- Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, United States of America
| | - Hee-Woong Lim
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, United States of America
| | - Marc E Rothenberg
- Division of Allergy and Immunology, Cincinnati Children's Hospital Medical Center, Cincinnati, United States of America
| |
Collapse
|