1
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BMC Bioinformatics 2024; 25:113. [PMID: 38486150 PMCID: PMC10941434 DOI: 10.1186/s12859-024-05724-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Yixuan Qiu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
2
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: Cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568369. [PMID: 38045428 PMCID: PMC10690270 DOI: 10.1101/2023.11.22.568369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Background Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. Results We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. Conclusions eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Yixuan Qiu
- School of Statistics & Management, Shanghai University of Finance and Economics, Shanghai,People's Republic of China
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
3
|
Ruan X, Huang Y, Geng L, Tian M, Liu Y, Tao M, Zheng X, Li P, Zhao M. Consistent analysis of differentially expressed genes across 7 cell types in papillary thyroid carcinoma. Comput Struct Biotechnol J 2023; 21:5337-5349. [PMID: 37954148 PMCID: PMC10637855 DOI: 10.1016/j.csbj.2023.10.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 10/22/2023] [Accepted: 10/23/2023] [Indexed: 11/14/2023] Open
Abstract
Single-cell transcriptome sequencing (scRNA-seq) provides a higher resolution of cellular differences than bulk RNA-seq, enabling the dissection of cell-type-specific responses to perturbations in papillary thyroid carcinoma (PTC). However, cellular genomic features are highly heterogeneous and have a large number of genes without any expression signals, which hinders the statistical power to identify differentially expressed genes and may generate many false-positive results. To overcome this challenge, we conducted an integrative analysis on two PTC scRNA-seq datasets and cross-validated consistent differential expression. By combining results from 32 common cell types in the two studies, we identified 31 consistently differentially expressed genes (DEGs) across seven cell types, including B cells, endothelial cells, epithelial cells, monocytes, NK cells, smooth muscle cells, and T cells. Functional enrichment analysis revealed that these genes are important for the adaptive immune response and autoimmune thyroid diseases. The additional disease-free survival analysis also confirmed that these 31 genes significantly affected patient survival time in large scale thyroid cancer cohort. Furthermore, we experimentally validated one of the top consistent DEGs as a potential biomarker gene of PTC epithelial cells, KRT7, which may be a upstream gene for the NF-κB signaling pathway. The result shows that KRT7 may promote thyroid cancer metastasis through the epithelial-mesenchymal transition and NF-κB signaling pathway. In summary, our single-cell transcriptome integration-based approach may provide insights into the important role of NF-κB in the underlying biology of the PTC.
Collapse
Affiliation(s)
- Xianhui Ruan
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Yue Huang
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Lin Geng
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Mengran Tian
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
- School of Medicine, Nankai University, Tianjin, China
- Department of Thyroid and Breast Surgery, Tianjin Key Laboratory of General Surgery in Construction, Tianjin Union Medical Center, Tianjin, China
| | - Yu Liu
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Mei Tao
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Xiangqian Zheng
- Department of Thyroid and Neck Tumor, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin’s Clinical Research Center for Cancer, Tianjin 300060, China
| | - Peng Li
- State Key Laboratory of Medicinal Chemical Biology, College of Life Sciences, Nankai University, 300071 Tianjin, China
| | - Min Zhao
- School of Science, Technology and Engineering, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia
| |
Collapse
|
4
|
Liu Y, Zhao J, Adams TS, Wang N, Schupp JC, Wu W, McDonough JE, Chupp GL, Kaminski N, Wang Z, Yan X. Correction: iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects. BMC Bioinformatics 2023; 24:394. [PMID: 37858060 PMCID: PMC10588114 DOI: 10.1186/s12859-023-05523-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023] Open
Affiliation(s)
- Yunqing Liu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Jiayi Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Taylor S Adams
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Ningya Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Jonas C Schupp
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
- Department of Respiratory Medicine, Hannover Medical School and Biomedical Research in End-Stage and Obstructive Lung Disease Hannover, German Center for Lung Research (DZL), Hannover, Germany
| | - Weimiao Wu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
- Meta Platforms, Inc, Cambridge, USA
| | - John E McDonough
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Geoffrey L Chupp
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA.
| | - Xiting Yan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA.
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA.
| |
Collapse
|