1
|
Xu Z, Wang Y, Cai W, Chen Y, Wang Y. Single microorganism RNA sequencing of microbiomes using smRandom-Seq. Nat Protoc 2025:10.1038/s41596-025-01181-5. [PMID: 40404925 DOI: 10.1038/s41596-025-01181-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Accepted: 03/21/2025] [Indexed: 05/24/2025]
Abstract
Bacteria colonize nearly every part of the human body and various environments, displaying remarkable diversity. Traditional population-level transcriptomics measurements provide only average population behaviors, often overlooking the heterogeneity within bacterial communities. To address this limitation, we have developed a droplet-based, high-throughput single-microorganism RNA sequencing method (smRandom-seq) that offers highly species specific and sensitive gene detection. Here we detail procedures for microbial sample preprocessing, in situ preindexed cDNA synthesis, in situ poly(dA) tailing, droplet barcoding, ribosomal RNA depletion and library preparation. The main smRandom-seq workflow, including sample processing, in situ reactions and library construction, takes ~2 days. This method features enhanced RNA coverage, reduced doublet rates and minimized ribosomal RNA contamination, thus enabling in-depth analysis of microbial heterogeneity. smRandom-seq is compatible with microorganisms from both laboratory cultures and complex microbial community samples, making it well suited for constructing single-microorganism transcriptomic atlases of bacterial strains and diverse microbial communities. This Protocol requires experience in molecular biology and RNA sequencing techniques, and it holds promising potential for researchers investigating bacterial resistance, microbiome heterogeneity and host-microorganism interactions.
Collapse
Affiliation(s)
- Ziye Xu
- Department of Laboratory Medicine of The First Affiliated Hospital and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- Zhejiang Key Laboratory of Clinical In Vitro Diagnostic Techniques, Hangzhou, China
| | - Yuting Wang
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China
| | - Wenjie Cai
- Department of Laboratory Medicine of The First Affiliated Hospital and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Yu Chen
- Department of Laboratory Medicine of The First Affiliated Hospital and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- Zhejiang Key Laboratory of Clinical In Vitro Diagnostic Techniques, Hangzhou, China
| | - Yongcheng Wang
- Department of Laboratory Medicine of The First Affiliated Hospital and Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.
- College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, China.
| |
Collapse
|
2
|
Xu C, Zhao LY, Ye CS, Xu KC, Xu KY. The application of machine learning in clinical microbiology and infectious diseases. Front Cell Infect Microbiol 2025; 15:1545646. [PMID: 40375898 PMCID: PMC12078339 DOI: 10.3389/fcimb.2025.1545646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2024] [Accepted: 04/08/2025] [Indexed: 05/18/2025] Open
Abstract
With the development of artificial intelligence(AI) in computer science and statistics, it has been further applied to the medical field. These applications include the management of infectious diseases, in which machine learning has created inroads in clinical microbiology, radiology, genomics, and the analysis of electronic health record data. Especially, the role of machine learning in microbiology has gradually become prominent, and it is used in etiological diagnosis, prediction of antibiotic resistance, association between human microbiome characteristics and complex host diseases, prognosis judgment, and prevention and control of infectious diseases. Machine learning in the field of microbiology mainly adopts supervised learning and unsupervised learning, involving algorithms from classification and regression to clustering and dimensionality reduction. This Review explains crucial concepts in machine learning for unfamiliar readers, describes machine learning's current applications in clinical microbiology and infectious diseases, and summarizes important approaches clinicians must be aware of when evaluating research using machine learning.
Collapse
Affiliation(s)
- Cheng Xu
- Clinical Laboratory of Chun’an First People’s Hospital, Zhejiang Provincial People’s Hospital Chun’an Branch, Hangzhou Medical College Affiliated Chun’an Hospital, Hangzhou, Zhejiang, China
| | - Ling-Yun Zhao
- Department of Medicine & Therapeutics, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Cun-Si Ye
- Department of Clinical Laboratory Medicine, Institution of Microbiology and Infectious Diseases, The First Affiliated Hospital, Hengyang Medical School, University of South China, Hengyang, Hunan, China
| | - Ke-Chen Xu
- School of Psychology, Zhejiang Normal University, Jinhua, China
- Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Jinhua, China
| | - Ke-Yang Xu
- Faculty of Chinese Medicine, and State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Macao SAR, China
| |
Collapse
|
3
|
Wang Y, Gao H, Li X, Li D, Huang F, Sun Y, Liu X, Yang J, Sun F. PRC1 as an independent adverse prognostic factor in Wilms tumor via integrated bioinformatics and experimental validation. Sci Rep 2025; 15:13282. [PMID: 40247060 PMCID: PMC12006549 DOI: 10.1038/s41598-025-98030-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Accepted: 04/09/2025] [Indexed: 04/19/2025] Open
Abstract
Wilms Tumor (WT), a prevalent pediatric renal malignancy, exhibits marked heterogeneity and variable clinical outcomes. Epithelial-mesenchymal transition (EMT), a biological process enabling epithelial cells to acquire mesenchymal traits associated with enhanced migratory and invasive capacities, plays a crucial role in cancer progression. Protein Regulator of Cytokinesis 1 (PRC1) is a critical protein in cell division, whose overexpression is linked to poor prognosis in various cancers. This study investigates the role of PRC1 as a key prognostic factor in WT and explore the mechanism through comprehensive bioinformatic and experimental approaches. Through bulk RNA-seq data from the TARGET database, we identified PRC1 as significantly up-regulated in WT and associated with poor overall survival. Functional enrichment analyses (GO, KEGG, GSEA) demonstrated PRC1's involvement in cell division, chromatin dynamics, and activation of oncogenic pathways including Wnt/β-catenin, PI3K/AKT/mTOR, and Hedgehog signaling. Immunological analysis showed that elevated PRC1 expression correlates with diminished immune cell activity, particularly in NK cells, suggesting potential immune evasion mechanisms. Single-cell RNA-seq analysis (GSE200256) confirmed PRC1's elevated expression in anaplastic Wilms tumor (AWT) compared to favorable Wilms tumor (FWT), and highlighted its involvement in intercellular communication and metastasis via the EMT process. Genomic analyses identified copy number variations (CNVs) and downregulated PRC1-targeting microRNAs as drivers of its overexpression. In vitro, PRC1 knockdown in WIT-49 cells significantly impaired migratory capacity, invasive potential, EMT progression, and glycolytic metabolism. These findings collectively position PRC1 as a promising therapeutic target and prognostic biomarker in WT.
Collapse
Affiliation(s)
- Yanping Wang
- Department of Pediatric Surgery, Qilu Hospital of Shandong University, Jinan, China
| | - Hongjie Gao
- Department of Pediatrics, Qilu Hospital of Shandong University, Jinan, China
| | - Xuetian Li
- Department of Pediatric Surgery, Qilu Hospital of Shandong University, Jinan, China
| | - Ding Li
- Department of Pediatric Surgery, Qilu Hospital of Shandong University, Jinan, China
| | - Fan Huang
- Department of Pediatric Surgery, Qilu Hospital of Shandong University, Jinan, China
| | - Yuqiang Sun
- Department of Pediatric Surgery, Qilu Hospital of Shandong University, Jinan, China
| | - Xingjian Liu
- Department of Pediatric Surgery, Qilu Hospital of Shandong University, Jinan, China
| | - Junli Yang
- Department of Pediatrics, Qilu Hospital of Shandong University, Jinan, China.
| | - Fengyin Sun
- Department of Pediatric Surgery, Qilu Hospital of Shandong University, Jinan, China.
| |
Collapse
|
4
|
Weinand K, Langan EM, Curtis M, Raychaudhuri S. Defining effective strategies to integrate multi-sample single-nucleus ATAC-seq datasets via a multimodal-guided approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.02.646871. [PMID: 40236024 PMCID: PMC11996549 DOI: 10.1101/2025.04.02.646871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Background Chromatin accessibility, measured via single-nucleus Assay for Transposase-Accessible Chromatin with sequencing (snATAC-seq), can reveal the underpinnings of transcriptional regulation across heterogeneous cell states. As the number and scale of snATAC-seq datasets increases, we need robust computational pipelines to integrate samples within a dataset and datasets across studies. These integration pipelines should correct cell-state-obfuscating technical effects while conserving underlying biological cell states, as has been shown for single-cell RNA-seq (scRNA-seq) pipelines. However, scRNA-seq integration methods have performed inconsistently on snATAC-seq datasets, potentially due to sparsity and genomic feature differences. Results Using single-nucleus multimodal datasets profiling ATAC and RNA simultaneously, we can measure snATAC-seq integration method performance by comparison to independently integrated snRNA-seq gold standard embeddings and annotations. Here, we benchmark 58 pipelines, incorporating 7 integration methods plus 1 embedding correction method with 5 feature sets. Using our command-line tool, we assessed 5 multimodal datasets at 3 different resolutions using 2 novel metrics to determine the best practices for multi-sample snATAC-seq integration. ATAC features outperformed Gene Activity Score (GAS) features, and embedding correction with Harmony was generally useful. SnapATAC2, PeakVI, and ArchR's iterative Latent Semantic Indexing (LSI) performed well. Conclusions We recommend SnapATAC2 + Harmony with pre-defined ENCODE candidate cis -regulatory element (cCRE) features as a first-pass pipeline given its metric performance, generalizability of features, and method resource-efficiency. This and other high-performing pipelines will guide future comprehensive gene regulation maps.
Collapse
|
5
|
Legenkaia M, Bourdieu L, Monasson R. Uncertainties in signal recovery from heterogeneous and convoluted time series with principal component analysis. Phys Rev E 2025; 111:044314. [PMID: 40411023 DOI: 10.1103/physreve.111.044314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Accepted: 03/03/2025] [Indexed: 05/26/2025]
Abstract
Principal component analysis (PCA) is one of the most used tools for extracting low-dimensional representations of data, in particular for time series. Performances are known to strongly depend on the quality (amount of noise) and the quantity of data. We here investigate the impact of heterogeneities, often present in real data, on the reconstruction of low-dimensional trajectories and of their associated modes. We focus in particular on the effects of sample-to-sample fluctuations and of component-dependent temporal convolution and noise in the measurements. We derive analytical predictions for the error on the reconstructed trajectory and the confusion between the modes using the replica method in a high-dimensional setting, in which the number and the dimension of the data are comparable. We find in particular that sample-to-sample variability is deleterious for the reconstruction of the signal trajectory, but beneficial for the inference of the modes, and that the fluctuations in the temporal convolution kernels prevent perfect recovery of the latent modes even for very weak measurement noise. Our predictions are corroborated by simulations with synthetic data for a variety of control parameters.
Collapse
Affiliation(s)
- Mariia Legenkaia
- Université PSL, Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Paris F-75005, France
- Sorbonne Université, Laboratoire de Physique de l'ENS, PSL and CNRS-UMR8023, 24 Rue Lhomond, 75005 Paris, France
| | - Laurent Bourdieu
- Université PSL, Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Paris F-75005, France
| | - Rémi Monasson
- Sorbonne Université, Laboratoire de Physique de l'ENS, PSL and CNRS-UMR8023, 24 Rue Lhomond, 75005 Paris, France
| |
Collapse
|
6
|
Zhao R, Zhang X, Geng Y, Lu D, Wang Y, Xie H, Zhang X, Xu S, Cao Y. SPRY1 regulates macrophage M1 polarization in skin aging and melanoma prognosis. Transl Oncol 2025; 54:102331. [PMID: 40023001 PMCID: PMC11915026 DOI: 10.1016/j.tranon.2025.102331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 01/28/2025] [Accepted: 02/10/2025] [Indexed: 03/04/2025] Open
Abstract
INTRODUCTION Skin aging is a complex, multifactorial process involving cellular damage, inflammation, and increased susceptibility to diseases. Despite its importance, the role of SPRY1 in skin aging remains poorly understood. This study aims to investigate the function of SPRY1 in skin aging, particularly its impact on macrophage M1 polarization, and explore its potential as a therapeutic target for mitigating skin aging and melanoma. METHODS Bioinformatics analyses were performed using datasets from the GTEx and GEO databases, alongside in vitro cellular experiments. These included Weighted Gene Co-expression Network Analysis (WGCNA), single-cell sequencing, and various cellular assays in RAW264.7 murine monocyte/macrophage leukemia cells and NIH/3T3 mouse skin fibroblasts. The assays comprised gene transfection, Cell Counting Kit-8 (CCK-8) assays, quantitative real-time PCR (qRT-PCR), and measurements of reactive oxygen species (ROS) and superoxide dismutase (SOD) activity. RESULTS SPRY1 was identified as a key gene within modules linked to skin aging. Single-cell sequencing revealed its enrichment in macrophages and keratinocytes. Knockdown of SPRY1 in RAW264.7 cells resulted in a shift from M1 to M2 macrophage polarization, reduced oxidative stress, and decreased expression of inflammatory markers. In NIH/3T3 cells, SPRY1 knockdown reduced cell viability and lowered the expression of inflammatory genes. Additionally, SPRY1 expression was downregulated in melanoma, and its reduced levels were associated with poorer survival outcomes. CONCLUSIONS SPRY1 accelerates skin aging by promoting macrophage M1 polarization and may serve as a promising therapeutic target. Future research should focus on in vivo validation and further exploration of its regulatory networks to develop novel treatments.
Collapse
Affiliation(s)
- Rongxin Zhao
- Department of Dermatology, Pudong New Area People's Hospital, 490 Chuanhuang South Road, Pudong New Area, Shanghai, China
| | - Xun Zhang
- Digestive Endoscopy Center, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai 200336, China
| | - Yingnan Geng
- Department of Burns and Plastic Surgery, Second Affiliated Hospital of Naval Medical University, 415 Fengyang Road, Huangpu District, Shanghai 200003, China
| | - Dan Lu
- Department of Dermatology, Pudong New Area People's Hospital, 490 Chuanhuang South Road, Pudong New Area, Shanghai, China
| | - Yuqing Wang
- Department of Dermatology, Xuzhou Huamei Cosmetology Hospital, Jiangsu, West Huaihai Road, Quanshan District, Xuzhou, Jiangsu, China
| | - Han Xie
- The Fifth People's Hospital of Shanghai, Fudan University, No. 128, Ruili Road, Minhang District, Shanghai, China
| | - Xiaofei Zhang
- Shanghai Xinmei Medical Beauty Outpatient Department, 202A, No.285, Jianguo West Road, Xuhui District, Shanghai, China.
| | - Shunming Xu
- Department of Dermatology, Pudong New Area People's Hospital, 490 Chuanhuang South Road, Pudong New Area, Shanghai, China.
| | - Yanyun Cao
- Department of Dermatology, Pudong New Area People's Hospital, 490 Chuanhuang South Road, Pudong New Area, Shanghai, China.
| |
Collapse
|
7
|
Gu R, Jiang L, Dai S, Yue Y, Li S, Zheng S, Wu L, Zhao S. Identification of exosome-related SERPINB1 as a novel predictor for tumor immune microenvironment and clinical outcomes in ovarian cancer. J Ovarian Res 2025; 18:65. [PMID: 40155942 PMCID: PMC11954311 DOI: 10.1186/s13048-025-01589-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 01/06/2025] [Indexed: 04/01/2025] Open
Abstract
BACKGROUND With a high global incidence of over three million new cases in 2020 and a high mortality of over two million fatalities, ovarian cancer is one of the most common malignant tumors in gynecology. Exosomes can control the immunological condition of the tumor microenvironment (TME) by participating in intercellular interactions. Therefore, we aimed to construct an exosome-related prognostic model to predict the clinical outcomes of ovarian cancer patients. METHODS In this research, expression patterns of exosome-related genes were examined in multiple single-cell RNA-sequencing and bulk RNA-sequencing datasets. In addition, a novel exosome-related prognostic model was established by the least absolute shrinkage and selection operator (LASSO) regression method. Then, the correlations between risk score and immunological characteristics of the TME were explored. Moreover, SERPINB1, a gene in the prognostic signature, was further analyzed to reveal its value as a novel biomarker. RESULTS In the current study, combined with single-cell and bulk omics datasets, we constructed an exosome-related prognostic model of four genes (LGALS3BP, SAT1, SERPINB1, and SH3BGRL3). Moreover, the risk score was associated with worse overall survival (OS) in ovarian cancer patients. Further analysis found that patients with high-risk score tended to shape a desert TME with hardly infiltration of immune cells. Then, SERPINB1, positively correlated with the favorable OS and negatively with the risk score, was chosen as the representative biomarker of the model. Moreover, SERPINB1 was positively correlated with the infiltration of immune subpopulations in both public and in-house cohort. In addition, the high-resolution analysis found that SERPINB1+ tumor cells communicated with microenvironment cells frequently, further explaining the potential reason for shaping an inflamed TME. CONCLUSION To sum up, we established a novel exosome-related prognostic model (LGALS3BP, SAT1, SERPINB1, and SH3BGRL3) to predict the prognosis of patients with ovarian cancer and identify the immunological characteristics of the TME. In addition, SERPINB1 was identified as a promising biomarker for prognostic prediction in ovarian cancer.
Collapse
Affiliation(s)
- Rui Gu
- Department of Obstetrics and Gynecology, Wuxi School of Medicine, Wuxi Maternity and Child Health Care Hospital, Jiangnan University, Jiangsu, 214002, China
| | - Liping Jiang
- Department of Obstetrics and Gynecology, Wuxi School of Medicine, Wuxi Maternity and Child Health Care Hospital, Jiangnan University, Jiangsu, 214002, China
| | - Shuqin Dai
- Department of Obstetrics and Gynecology, Wuxi School of Medicine, Wuxi Maternity and Child Health Care Hospital, Jiangnan University, Jiangsu, 214002, China
| | - Yajie Yue
- Department of Obstetrics and Gynecology, Wuxi School of Medicine, Wuxi Maternity and Child Health Care Hospital, Jiangnan University, Jiangsu, 214002, China
| | - Shangjin Li
- Department of Obstetrics and Gynecology, Wuxi School of Medicine, Wuxi Maternity and Child Health Care Hospital, Jiangnan University, Jiangsu, 214002, China
| | - Shudan Zheng
- Department of Obstetrics and Gynecology, Wuxi School of Medicine, Wuxi Maternity and Child Health Care Hospital, Jiangnan University, Jiangsu, 214002, China
| | - Liwei Wu
- Department of Obstetrics and Gynecology, Wuxi School of Medicine, Wuxi Maternity and Child Health Care Hospital, Jiangnan University, Jiangsu, 214002, China.
| | - Shaojie Zhao
- Department of Obstetrics and Gynecology, Wuxi School of Medicine, Wuxi Maternity and Child Health Care Hospital, Jiangnan University, Jiangsu, 214002, China.
| |
Collapse
|
8
|
Liu Q, Liu Z, Qian Y, Wu M, Mo J, Wang C, Xu G, Leng L, Zhang S. Alterations in Gene Expression and Alternative Splicing Induced by Plasmid-Mediated Overexpression of GFP and P2RY12 Within the A549 Cell Line. Int J Mol Sci 2025; 26:2973. [PMID: 40243586 PMCID: PMC11988474 DOI: 10.3390/ijms26072973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2025] [Revised: 03/17/2025] [Accepted: 03/20/2025] [Indexed: 04/18/2025] Open
Abstract
Phenotypic modifications and their effects on cellular functions through the up-regulation of target gene expression have frequently been observed in genetic studies, but the unique roles of cell lines and their introduced plasmids in influencing these functions have not been fully revealed. In this research, we developed two distinct cell lines derived from the A549 cell line: one that stably overexpresses GFP and another that is a polyclonal stable line overexpressing both GFP and P2RY12. We then utilized transcriptome sequencing (RNA-seq) technology to screen out differentially expressed genes (DEGs) and genes with differential transcript usage (gDTUs) after GFP overexpression (GFP-OE) and P2RY12 overexpression (P2RY12-OE). We found that, compared with A549, there were more than 1700 differentially expressed genes (DEGs) in both GFP-OE and P2RY12-OE cells, while only 866 DEGs were identified in GFP-OE and P2RY12-OE cells. Notably, the differences in transcript usage were relatively minor, with only over 400 genes exhibiting changes across all three groups. The functional analysis of DEGs and gDTUs showed that they were both highly enriched in the pathways associated with cell proliferation and migration. In summary, we performed an extensive analysis of the transcriptome profile of gene expression and alternative splicing with GFP-OE and P2RY12-OE, enhancing our comprehension of how genes function within cells and the processes that control gene expression.
Collapse
Affiliation(s)
- Qingqing Liu
- College of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Q.L.); (Y.Q.); (M.W.)
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Z.L.); (J.M.); (C.W.); (G.X.)
| | - Zhaoyu Liu
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Z.L.); (J.M.); (C.W.); (G.X.)
- School of Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China
| | - Yongqi Qian
- College of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Q.L.); (Y.Q.); (M.W.)
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Z.L.); (J.M.); (C.W.); (G.X.)
| | - Mingxu Wu
- College of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Q.L.); (Y.Q.); (M.W.)
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Z.L.); (J.M.); (C.W.); (G.X.)
| | - Jing Mo
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Z.L.); (J.M.); (C.W.); (G.X.)
| | - Can Wang
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Z.L.); (J.M.); (C.W.); (G.X.)
| | - Guoqing Xu
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Z.L.); (J.M.); (C.W.); (G.X.)
| | - Liang Leng
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China; (Z.L.); (J.M.); (C.W.); (G.X.)
| | - Sanyin Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| |
Collapse
|
9
|
Luan Y, Zhang Y, Li S, Gao C, Ying X, Zhao S, Zhang B. CD47 is a tumor cell-derived exosomal signature and regulates tumor immune microenvironment and immunotherapy responses. Transl Oncol 2025; 53:102291. [PMID: 39864342 PMCID: PMC11803903 DOI: 10.1016/j.tranon.2025.102291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 12/26/2024] [Accepted: 01/16/2025] [Indexed: 01/28/2025] Open
Abstract
BACKGROUND The pathogenesis of ovarian cancer (OvCa) involves a complex interplay of genetic, environmental, and hormonal factors. With the in-depth exploration of tumor ecosystem, exosomes can mediate the immunological status of tumor microenvironment (TME). Therefore, we aimed to recognize the tumor-derived exosomes (TEXs) which can distinguish the immune-hot and cold tumors and reflect the immunotherapeutic responses. METHODS A large set of transcriptomic and single-cell RNA-sequencing (scRNA-seq) datasets were downloaded and used to analyze the expression pattern of CD47 and its immuno-correlations in OvCa and multiple epithelial cell carcinomas such as breast cancers. In addition, a pan-gynecological cancer cohort was used to validate the correlation between CD47 and the inflamed TME. RESULTS In the current study, we found that CD47 was a TEX signature and had no transcriptional differences among patients with different clinicopathological features. Moreover, CD47 expression was positively correlated with the activation of immunological signaling pathways and enrichment of immune cell subpopulations in OvCa. Furthermore, in breast cancer and gynecological cancers, CD47, specially expressed in tumor cells, also showed favorable ability to distinguish the immune-hot and cold carcinomas. Moreover, in immunotherapy cohorts of breast cancer and other epithelial cell carcinomas, patients with CD47-high phenotype were more sensitive to immunotherapy and tended to achieve remission after treatment. Results from the TMA showed that CD47 was upregulated in tumor tissues and positively correlated with CD8 level. CONCLUSION In conclusion, CD47 is associated with an inflammatory TME, immune-hot tumors, and sensitivity of immunotherapy, highlighting the values of CD47 in identifying immunological traits and an immunotherapeutic response.
Collapse
Affiliation(s)
- Yifei Luan
- School of Innovation and Entrepreneurship, Hangzhou Medical College, Hangzhou 310053, PR China
| | - Yinghui Zhang
- Wuxi Maternal and Child Health Care Hospital, The Affiliated Women's Hospital of Jiangnan University, Wuxi 214002, PR China
| | - Shangjin Li
- Wuxi Maternal and Child Health Care Hospital, The Affiliated Women's Hospital of Jiangnan University, Wuxi 214002, PR China
| | - Caiyun Gao
- Market Supervision and Law Enforcement Guarantee Service Center of Xihu District, Hangzhou 310013, PR China
| | - Xinyi Ying
- Department of Clinical Medicine, Hangzhou Medical College, Hangzhou 310053, PR China
| | - Shaojie Zhao
- Wuxi Maternal and Child Health Care Hospital, The Affiliated Women's Hospital of Jiangnan University, Wuxi 214002, PR China.
| | - Bing Zhang
- Wuxi Maternal and Child Health Care Hospital, The Affiliated Women's Hospital of Jiangnan University, Wuxi 214002, PR China.
| |
Collapse
|
10
|
Abdelnaby M, Moussa MR. A Benchmarking Study of Random Projections and Principal Components for Dimensionality Reduction Strategies in Single Cell Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.04.636499. [PMID: 39974925 PMCID: PMC11838541 DOI: 10.1101/2025.02.04.636499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Principal Component Analysis (PCA) has long been a cornerstone in dimensionality reduction for high-dimensional data, including single-cell RNA sequencing (scRNA-seq). However, PCA's performance typically degrades with increasing data size, can be sensitive to outliers, and assumes linearity. Recently, Random Projection (RP) methods have emerged as promising alternatives, addressing some of these limitations. This study systematically and comprehensively evaluates PCA and RP approaches, including Singular Value Decomposition (SVD) and randomized SVD, alongside Sparse and Gaussian Random Projection algorithms, with a focus on computational efficiency and downstream analysis effectiveness. We benchmark performance using multiple scRNA-seq datasets including labeled and unlabeled publicly available datasets. We apply Hierarchical Clustering and Spherical K-Means clustering algorithms to assess downstream clustering quality. For labeled datasets, clustering accuracy is measured using the Hungarian algorithm and Mutual Information. For unlabeled datasets, the Dunn Index and Gap Statistic capture cluster separation. Across both dataset types, the Within-Cluster Sum of Squares (WCSS) metric is used to assess variability. Additionally, locality preservation is examined, with RP outperforming PCA in several of the evaluated metrics. Our results demonstrate that RP not only surpasses PCA in computational speed but also rivals and, in some cases, exceeds PCA in preserving data variability and clustering quality. By providing a thorough benchmarking of PCA and RP methods, this work offers valuable insights into selecting optimal dimensionality reduction techniques, balancing computational performance, scalability, and the quality of downstream analyses.
Collapse
Affiliation(s)
| | - Marmar R. Moussa
- School of Computer Science, University of Oklahoma, Norman, OK, USA
| |
Collapse
|
11
|
Mei J, Luo Z, Cai Y, Wan R, Qian Z, Chu J, Sun Y, Shi Y, Jiang Y, Zhang Y, Yin Y, Chen S. Altered Atlas of Exercise-Responsive MicroRNAs Revealing miR-29a-3p Attacks Armored and Cold Tumors and Boosts Anti-B7-H3 Therapy. RESEARCH (WASHINGTON, D.C.) 2025; 8:0590. [PMID: 39845707 PMCID: PMC11751204 DOI: 10.34133/research.0590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 12/03/2024] [Accepted: 12/26/2024] [Indexed: 01/24/2025]
Abstract
Increasing evidence has shown that physical exercise remarkably inhibits oncogenesis and progression of numerous cancers and exercise-responsive microRNAs (miRNAs) exert a marked role in exercise-mediated tumor suppression. In this research, expression and prognostic values of exercise-responsive miRNAs were examined in breast cancer (BRCA) and further pan-cancer types. In addition, multiple independent public and in-house cohorts, in vitro assays involving multiple, macrophages, fibroblasts, and tumor cells, and in vivo models were utilized to uncover the tumor-suppressive roles of miR-29a-3p in cancers. Here, we reported that miR-29a-3p was the exercise-responsive miRNA, which was lowly expressed in tumor tissues and associated with unfavorable prognosis in BRCA. Mechanistically, miR-29a-3p targeted macrophages, fibroblasts, and tumor cells to down-regulate B7 homolog 3 (B7-H3) expression. Single-cell RNA sequencing (scRNA-seq) and cytometry by time-of-flight (CyTOF) demonstrated that miR-29a-3p attacked the armored and cold tumors, thereby shaping an immuno-hot tumor microenvironment (TME). Translationally, liposomes were developed and loaded with miR-29a-3p (lipo@miR-29a-3p), and lipo@miR-29a-3p exhibited promising antitumor effects in a mouse model with great biocompatibility. In conclusion, we uncovered that miR-29a-3p is a critical exercise-responsive miRNA, which attacked armored and cold tumors by inhibiting B7-H3 expression. Thus, miR-29a-3p restoration could be an alternative strategy for antitumor therapy.
Collapse
Affiliation(s)
- Jie Mei
- Department of Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
- The First Clinical Medicine College, Nanjing Medical University, Nanjing 211166, China
| | - Zhiwen Luo
- Department of Sports Medicine, Huashan Hospital Affiliated to Fudan University, Shanghai 200040, China
| | - Yun Cai
- Department of Central Laboratory, Changzhou Jintan First People’s Hospital, Jiangsu University, Changzhou 213200, China
| | - Renwen Wan
- Department of Sports Medicine, Huashan Hospital Affiliated to Fudan University, Shanghai 200040, China
| | - Zhiwen Qian
- Departments of Gynecology, Wuxi Maternal and Child Health Care Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi 214023, China
| | - Jiahui Chu
- Department of Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
- The First Clinical Medicine College, Nanjing Medical University, Nanjing 211166, China
| | - Yaying Sun
- Department of Sports Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, China
| | - Yuxin Shi
- Department of Oncology, The Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, Nanjing 211166, China
| | - Ying Jiang
- Department of Gynecology, The Obstetrics and Gynecology Hospital Affiliated to Jiangnan University, Wuxi 214023, China
| | - Yan Zhang
- Departments of Gynecology, Wuxi Maternal and Child Health Care Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi 214023, China
- Department of Gynecology, The Obstetrics and Gynecology Hospital Affiliated to Jiangnan University, Wuxi 214023, China
| | - Yongmei Yin
- Department of Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
- Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Personalized Cancer Medicine, Nanjing Medical University, Nanjing 211166, China
- Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, 211166, China
| | - Shiyi Chen
- Department of Sports Medicine, Huashan Hospital Affiliated to Fudan University, Shanghai 200040, China
| |
Collapse
|
12
|
Lahaie SC, Brezner N, Murai KK. Single-cell omics and heterogeneity of neuroglial cells. HANDBOOK OF CLINICAL NEUROLOGY 2025; 209:265-275. [PMID: 40122628 DOI: 10.1016/b978-0-443-19104-6.00013-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Abstract
Our bodies contain a rich diversity of cell types with unique physiologic properties. Interestingly, cells within our bodies contain the same DNA content, yet they can vary dramatically with respect to their molecular, structural, and functional properties. The need to better understand cellular complexity and diversity in biologic systems has led to a technical revolution in the field through the development of sophisticated single-cell "omic" approaches. This allows the investigation of the genome, epigenome, transcriptome, and proteome of individual cells derived from complex samples or tissues, such as nervous system tissue. These methods are allowing scientists to detect distinct cell populations and cellular states in different species (including rodent and human) and molecular transitions of cell populations across the lifespan. Recent studies have revealed that astrocytes, oligodendrocytes, and microglia exhibit greater molecular and functional heterogeneity than originally thought and innovative single-cell technologies have allowed a more comprehensive and less biased view of this cellular diversity. The chapter begins by providing a primer of single-cell transcriptomic and spatial transcriptomic approaches that have been particularly influential in uncovering single-cell diversity of neuroglial cells in the brain. It then takes a closer look at how these technologies have been pivotal in defining neuroglial cell subtypes and for determining their spatial relationships within the CNS. Then, it concludes with discussion of how the recent technical advances and discoveries have provoked new questions about the origin, organization, and functional purpose of diverse neuroglial cell subtypes.
Collapse
Affiliation(s)
- Sylvie C Lahaie
- Centre for Research in Neuroscience, Department of Neurology & Neurosurgery, Brain Repair and Integrative Neuroscience Program, Research Institute of the McGill University Health Centre, Montreal General Hospital, Montreal, QC, Canada
| | - Naama Brezner
- Centre for Research in Neuroscience, Department of Neurology & Neurosurgery, Brain Repair and Integrative Neuroscience Program, Research Institute of the McGill University Health Centre, Montreal General Hospital, Montreal, QC, Canada
| | - Keith K Murai
- Centre for Research in Neuroscience, Department of Neurology & Neurosurgery, Brain Repair and Integrative Neuroscience Program, Research Institute of the McGill University Health Centre, Montreal General Hospital, Montreal, QC, Canada; Quantitative Life Sciences Graduate Program, McGill University, Montreal, QC, Canada.
| |
Collapse
|
13
|
Lu S, Liu L, Lei W, Wang D, Zhu H, Lai Q, Ma L, Ru D. Cryptic divergence in and evolutionary dynamics of endangered hybrid Picea brachytyla sensu stricto in the Qinghai-Tibet Plateau. BMC PLANT BIOLOGY 2024; 24:1202. [PMID: 39701948 DOI: 10.1186/s12870-024-05851-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Accepted: 11/19/2024] [Indexed: 12/21/2024]
Abstract
BACKGROUND The visual similarities observed across various plant groups often conceal underlying genetic distinctions. This occurrence, known as cryptic diversity, underscores the key importance of identifying and understanding cryptic intraspecific evolutionary lineages in evolutionary ecology and conservation biology. RESULTS In this study, we conducted transcriptome analysis of 81 individuals from 18 natural populations of a northern lineage of Picea brachytyla sensu stricto that is endemic to the Qinghai-Tibet Plateau. Our analysis revealed the presence of two distinct local lineages, emerging approximately 444.8 thousand years ago (kya), within this endangered species. The divergence event aligns well with the geographic and climatic oscillations that occurred across the distributional range during the Mid-Pleistocene epoch. Additionally, we identified numerous environmentally correlated gene variants, as well as many other genes showing signals of positive selection across the genome. These factors likely contributed to the persistence and adaptation of the two distinct local lineages. CONCLUSIONS Our findings shed light on the highly dynamic evolutionary processes underlying the remarkably similar phenotypes of the two lineages of this endangered species. Importantly, these results enhance our understanding of the evolutionary past for this and for other endangered species with similar histories, and also provide guidance for the development of conservation plans.
Collapse
Affiliation(s)
- Shengming Lu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Lian Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Weixiao Lei
- Xi'an Center for Disease Control and Prevention, Xi'an, China
| | - Donglei Wang
- Key Laboratory for Bio-resource and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, Sichuan, China
| | - Hui Zhu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Qing Lai
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Liru Ma
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
| | - Dafu Ru
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
14
|
Krijnen K, Blenkinsopp P, Heeren RMA, Anthony IGM. Processing Next-Generation Mass Spectrometry Imaging Data: Principal Component Analysis at Scale. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:3063-3069. [PMID: 39467687 PMCID: PMC11622226 DOI: 10.1021/jasms.4c00314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 10/10/2024] [Accepted: 10/18/2024] [Indexed: 10/30/2024]
Abstract
Mass spectrometry imaging (MSI) is constantly improving in spatial resolving power, throughput and mass resolution. Although beneficial, these improvements increase data set size and content. The larger data requires correspondingly fast computer-based analyses. However, these analyses often do not scale well with increased data size. Principal component analysis (PCA) is an important analytical tool commonly used with MSI data; however, most PCA algorithms load and process the entire data set within random access memory (RAM) which is most often insufficient for large data sets. PCA algorithms that use less RAM than the data set exist but are usually much slower or sacrifice precision and are rarely used for MSI data processing. Incremental PCA (IPCA) is an alternative algorithm that avoids large RAM allocations while also preserving speed and analytical precision. Here, we demonstrate and benchmark the use of differing implementations of IPCA, PCA, and commercial software on large and often complex MSI data sets. We show that using an already-published Python-based IPCA algorithm, IPCA can be successfully applied to MSI data sets too large to fit with RAM. Furthermore, our benchmarks demonstrate that, contrary to expectations, IPCA is faster than all other tested PCA implementations on all large data sets that can be directly compared.
Collapse
Affiliation(s)
- Kasper Krijnen
- The
Maastricht MultiModal Molecular Imaging Institute (M4i), Division
of Imaging Mass Spectrometry, Maastricht
University, Maastricht 6229 ER, The Netherlands
| | - Paul Blenkinsopp
- Ionoptika
Ltd., Unit B6, Millbrook
Cl, Chandler’s Ford, Eastleigh, SO53 4BZ, United
Kingdom
| | - Ron M. A. Heeren
- The
Maastricht MultiModal Molecular Imaging Institute (M4i), Division
of Imaging Mass Spectrometry, Maastricht
University, Maastricht 6229 ER, The Netherlands
| | - Ian G. M. Anthony
- The
Maastricht MultiModal Molecular Imaging Institute (M4i), Division
of Imaging Mass Spectrometry, Maastricht
University, Maastricht 6229 ER, The Netherlands
| |
Collapse
|
15
|
Subedi S, Sumida TS, Park YP. A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection. Life Sci Alliance 2024; 7:e202402713. [PMID: 39107066 PMCID: PMC11303850 DOI: 10.26508/lsa.202402713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 08/09/2024] Open
Abstract
Probabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states. A dictionary matrix, consisting of topic-specific gene frequency vectors, provides interpretable bases to be compared with known cell type-specific marker genes and other pathway annotations. However, fitting a topic model on a large number of cells would require heavy computational resources-specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating a Single-cell data matrix by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps.
Collapse
Affiliation(s)
- Sishir Subedi
- Graduate Program, University of British Columbia, Vancouver, Canada
- BC Cancer Research, Vancouver, Canada
| | - Tomokazu S Sumida
- Neurology, Program for Neuroinflammation, Yale School of Medicine, New Haven, CT, USA
| | - Yongjin P Park
- BC Cancer Research, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Department of Statistics, University of British Columbia, Vancouver, Canada
| |
Collapse
|
16
|
Cahill R, Wang Y, Xian RP, Lee AJ, Zeng H, Yu B, Tasic B, Abbasi-Asl R. Unsupervised pattern identification in spatial gene expression atlas reveals mouse brain regions beyond established ontology. Proc Natl Acad Sci U S A 2024; 121:e2319804121. [PMID: 39226356 PMCID: PMC11406299 DOI: 10.1073/pnas.2319804121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 07/24/2024] [Indexed: 09/05/2024] Open
Abstract
The rapid growth of large-scale spatial gene expression data demands efficient and reliable computational tools to extract major trends of gene expression in their native spatial context. Here, we used stability-driven unsupervised learning (i.e., staNMF) to identify principal patterns (PPs) of 3D gene expression profiles and understand spatial gene distribution and anatomical localization at the whole mouse brain level. Our subsequent spatial correlation analysis systematically compared the PPs to known anatomical regions and ontology from the Allen Mouse Brain Atlas using spatial neighborhoods. We demonstrate that our stable and spatially coherent PPs, whose linear combinations accurately approximate the spatial gene data, are highly correlated with combinations of expert-annotated brain regions. These PPs yield a brain ontology based purely on spatial gene expression. Our PP identification approach outperforms principal component analysis and typical clustering algorithms on the same task. Moreover, we show that the stable PPs reveal marked regional imbalance of brainwide genetic architecture, leading to region-specific marker genes and gene coexpression networks. Our findings highlight the advantages of stability-driven machine learning for plausible biological discovery from dense spatial gene expression data, streamlining tasks that are infeasible by conventional manual approaches.
Collapse
Affiliation(s)
- Robert Cahill
- Department of Neurology, University of California, San Francisco, CA 94143
- UCSF Weill Institute for Neurosciences, San Francisco, CA 94143
| | - Yu Wang
- Department of Statistics, University of California, Berkeley, CA 94720
| | - R Patrick Xian
- Department of Neurology, University of California, San Francisco, CA 94143
- UCSF Weill Institute for Neurosciences, San Francisco, CA 94143
| | - Alex J Lee
- Department of Neurology, University of California, San Francisco, CA 94143
- UCSF Weill Institute for Neurosciences, San Francisco, CA 94143
| | - Hongkui Zeng
- Allen Institute for Brain Science, Seattle, WA 98109
| | - Bin Yu
- Department of Statistics, University of California, Berkeley, CA 94720
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720
| | | | - Reza Abbasi-Asl
- Department of Neurology, University of California, San Francisco, CA 94143
- UCSF Weill Institute for Neurosciences, San Francisco, CA 94143
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143
| |
Collapse
|
17
|
Song R, Shi P, Xiang L, He Y, Dong Y, Miao Y, Qi J. Evaluation of barley genotypes for drought adaptability: based on stress indices and comprehensive evaluation as criteria. FRONTIERS IN PLANT SCIENCE 2024; 15:1436872. [PMID: 39253570 PMCID: PMC11381406 DOI: 10.3389/fpls.2024.1436872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 08/08/2024] [Indexed: 09/11/2024]
Abstract
The prevalence of drought events worldwide emphasizes the importance of screening and cultivating drought-adapted crops. In this study, 206 germplasm resources were used as materials, dry weight as target trait, and two genotyping methods as criteria to evaluate drought adaptability at the seedling establishment stage. The results showed a significant decrease in average dry weight of the tested germplasm resources (from 746.90 mg to 285.40 mg) and rich variation in the responses of dry weight among each genotype to drought (CV=61.14%). In traditional evaluation method, drought resistance coefficient (DC), geometric mean productivity index (GMP), mean productivity index (MP), stress susceptibility index (SSI), stress tolerance index (STI), and tolerance index (TOL) also exhibited diversity in tested genotypes (CV>30%). However, these indices showed varying degrees of explanation for dry weight under stress and non-stress environments and failed to differentiate drought adaptability among genotypes clearly. In new evaluation method, four stress indices were developed to quantify barley seedling production and stability capacities. Compared to traditional stress indices, the stress production index (SI) explained dry weight more comprehensively under stress conditions (R2 = 0.98), while the ideal production index (II) explained dry weight better under non-stress conditions (R2 = 0.89). Furthermore, the potential index (PI) and elasticity index (EI) eliminated disparities in traditional stress indices and comprehensively clarified the contribution of elasticity and potential to production capacity under drought stress. Ultimately, through grading evaluation and cluster analysis, the tested germplasm resources were effectively categorized, and 11 genotypes were identified as suitable for cultivation in arid areas. Overall, the comprehensive evaluation method based on the newly developed stress indices surpasses the traditional method in screening drought adaptability of crops and serves as a vital tool for identifying high-stability and high-production capacities genotypes in various environments, which is expected to provide practical guidance for barley planting and breeding in arid areas.
Collapse
Affiliation(s)
- Ruijiao Song
- The Key Laboratory of Oasis Eco-agriculture, Xinjiang Production and Construction Group-College of Agriculture, Shihezi University, Shihezi, China
| | - Peichun Shi
- The Key Laboratory of Oasis Eco-agriculture, Xinjiang Production and Construction Group-College of Agriculture, Shihezi University, Shihezi, China
| | - Li Xiang
- Qitai Triticeae Crops Experimental Station, Xinjiang Academy of Agricultural Sciences, Qitai, China
| | - Yu He
- The Key Laboratory of Oasis Eco-agriculture, Xinjiang Production and Construction Group-College of Agriculture, Shihezi University, Shihezi, China
| | - Yusheng Dong
- Qitai Triticeae Crops Experimental Station, Xinjiang Academy of Agricultural Sciences, Qitai, China
| | - Yu Miao
- Qitai Triticeae Crops Experimental Station, Xinjiang Academy of Agricultural Sciences, Qitai, China
| | - Juncang Qi
- The Key Laboratory of Oasis Eco-agriculture, Xinjiang Production and Construction Group-College of Agriculture, Shihezi University, Shihezi, China
| |
Collapse
|
18
|
Weine E, Carbonetto P, Stephens M. Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca. Bioinformatics 2024; 40:btae494. [PMID: 39110511 PMCID: PMC11322042 DOI: 10.1093/bioinformatics/btae494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 07/04/2024] [Accepted: 08/05/2024] [Indexed: 08/15/2024] Open
Abstract
SUMMARY Motivated by theoretical and practical issues that arise when applying Principal component analysis (PCA) to count data, Townes et al. introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (scRNA-seq) data. However, fitting GLM-PCA is computationally challenging. Here we study this problem, and show that a simple algorithm, which we call "Alternating Poisson Regression" (APR), produces better quality fits, and in less time, than existing algorithms. APR is also memory-efficient and lends itself to parallel implementation on multi-core processors, both of which are helpful for handling large scRNA-seq datasets. We illustrate the benefits of this approach in three publicly available scRNA-seq datasets. The new algorithms are implemented in an R package, fastglmpca. AVAILABILITY AND IMPLEMENTATION The fastglmpca R package is released on CRAN for Windows, macOS and Linux, and the source code is available at github.com/stephenslab/fastglmpca under the open source GPL-3 license. Scripts to reproduce the results in this paper are also available in the GitHub repository and on Zenodo.
Collapse
Affiliation(s)
- Eric Weine
- Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
- Department of Data Science, Dana Farber Cancer Institute, Boston, MA 02215, United States
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, United States
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, United States
- Department of Statistics, University of Chicago, Chicago, IL 60637, United States
| |
Collapse
|
19
|
Weine E, Carbonetto P, Stephens M. Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.23.586420. [PMID: 38585920 PMCID: PMC10996495 DOI: 10.1101/2024.03.23.586420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Summary Motivated by theoretical and practical issues that arise when applying Principal Components Analysis (PCA) to count data, Townes et al introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (RNA-seq) data. However, fitting GLM-PCA is computationally challenging. Here we study this problem, and show that a simple algorithm, which we call "Alternating Poisson Regression" (APR), produces better quality fits, and in less time, than existing algorithms. APR is also memory-efficient, and lends itself to parallel implementation on multi-core processors, both of which are helpful for handling large single-cell RNA-seq data sets. We illustrate the benefits of this approach in two published single-cell RNA-seq data sets. The new algorithms are implemented in an R package, fastglmpca. Availability and implementation The fastglmpca R package is released on CRAN for Windows, macOS and Linux, and the source code is available at github.com/stephenslab/fastglmpca under the open source GPL-3 license. Scripts to reproduce the results in this paper are also available in the GitHub repository. Contact mstephens@uchicago.edu. Supplementary information Supplementary data are available on BioRxiv online.
Collapse
|
20
|
Wu L, Jin W, Yu H, Liu B. Modulating autophagy to treat diseases: A revisited review on in silico methods. J Adv Res 2024; 58:175-191. [PMID: 37192730 PMCID: PMC10982871 DOI: 10.1016/j.jare.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 05/05/2023] [Accepted: 05/09/2023] [Indexed: 05/18/2023] Open
Abstract
BACKGROUND Autophagy refers to the conserved cellular catabolic process relevant to lysosome activity and plays a vital role in maintaining the dynamic equilibrium of intracellular matter by degrading harmful and abnormally accumulated cellular components. Accumulating evidence has recently revealed that dysregulation of autophagy by genetic and exogenous interventions may disrupt cellular homeostasis in human diseases. In silico approaches as powerful aids to experiments have also been extensively reported to play their critical roles in the storage, prediction, and analysis of massive amounts of experimental data. Thus, modulating autophagy to treat diseases by in silico methods would be anticipated. AIM OF REVIEW Here, we focus on summarizing the updated in silico approaches including databases, systems biology network approaches, omics-based analyses, mathematical models, and artificial intelligence (AI) methods that sought to modulate autophagy for potential therapeutic purposes, which will provide a new insight into more promising therapeutic strategies. KEY SCIENTIFIC CONCEPTS OF REVIEW Autophagy-related databases are the data basis of the in silico method, storing a large amount of information about DNA, RNA, proteins, small molecules and diseases. The systems biology approach is a method to systematically study the interrelationships among biological processes including autophagy from a macroscopic perspective. Omics-based analyses are based on high-throughput data to analyze gene expression at different levels of biological processes involving autophagy. mathematical models are visualization methods to describe the dynamic process of autophagy, and its accuracy is related to the selection of parameters. AI methods use big data related to autophagy to predict autophagy targets, design targeted small molecules, and classify diverse human diseases for potential therapeutic applications.
Collapse
Affiliation(s)
- Lifeng Wu
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Wenke Jin
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Haiyang Yu
- State Key Laboratory of Component-based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China; Haihe Laboratory of Modern Chinese Medicine, Tianjin 301617, China.
| | - Bo Liu
- Department of Biotherapy, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu 610041, China.
| |
Collapse
|
21
|
Feng H, Cottrell S, Hozumi Y, Wei GW. Multiscale differential geometry learning of networks with applications to single-cell RNA sequencing data. Comput Biol Med 2024; 171:108211. [PMID: 38422960 PMCID: PMC10965033 DOI: 10.1016/j.compbiomed.2024.108211] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/02/2024] [Accepted: 02/25/2024] [Indexed: 03/02/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology, offering unparalleled insights into the intricate landscape of cellular diversity and gene expression dynamics. scRNA-seq analysis represents a challenging and cutting-edge frontier within the field of biological research. Differential geometry serves as a powerful mathematical tool in various applications of scientific research. In this study, we introduce, for the first time, a multiscale differential geometry (MDG) strategy for addressing the challenges encountered in scRNA-seq data analysis. We assume that intrinsic properties of cells lie on a family of low-dimensional manifolds embedded in the high-dimensional space of scRNA-seq data. Multiscale cell-cell interactive manifolds are constructed to reveal complex relationships in the cell-cell network, where curvature-based features for cells can decipher the intricate structural and biological information. We showcase the utility of our novel approach by demonstrating its effectiveness in classifying cell types. This innovative application of differential geometry in scRNA-seq analysis opens new avenues for understanding the intricacies of biological networks and holds great potential for network analysis in other fields.
Collapse
Affiliation(s)
- Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Sean Cottrell
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
22
|
Gong M, Yu Y, Wang Z, Zhang J, Wang X, Fu C, Zhang Y, Wang X. scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis. Comput Biol Med 2024; 171:108230. [PMID: 38442554 DOI: 10.1016/j.compbiomed.2024.108230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/06/2024] [Accepted: 02/25/2024] [Indexed: 03/07/2024]
Abstract
Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT's methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence's grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.
Collapse
Affiliation(s)
- Meiqin Gong
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China
| | - Yun Yu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Zixuan Wang
- College of Electronics and information Engineering, SiChuan University, Chengdu, 610065, China
| | - Junming Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiongyi Wang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Cheng Fu
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China
| | - Xiaodong Wang
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
23
|
Zhou M, Li Y. Spatial distribution and source identification of potentially toxic elements in Yellow River Delta soils, China: An interpretable machine-learning approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 912:169092. [PMID: 38056655 DOI: 10.1016/j.scitotenv.2023.169092] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 11/15/2023] [Accepted: 12/02/2023] [Indexed: 12/08/2023]
Abstract
Identifying the driving factors and quantifying the sources of potentially toxic elements (PTEs) are essential for protecting the ecological environment of the Yellow River Delta. In this study, data from 201 surface soil samples and 16 environmental variables were collected, and the random forest (RF) and Shapley additive explanations (SHAP) methods were then combined to explore the key factors affecting soil PTEs. An innovative t-distributed random neighbor embedding-RF-SHAP model was then constructed, based on the absolute principal component score and multivariate linear regression model, to quantitatively determine PTE sources. Although average PTE concentrations did not exceed the risk control values, PTE distributions exhibited significant differences. It was found that sodium, soil organic matter, and phosphorus contents were the three most important factors affecting PTEs, and human activities and natural environmental factors both influence PTE contents by altering the soil properties. The proposed model successfully determined PTE sources in the soil, outperforming the original linear regression model with a significantly lower RMSE. Source analysis revealed that the parent material was the main contributor to soil PTEs, accounting for more than half of the total PTE content. Industrial and agricultural activities also contributed to an increase in soil PTEs, with average contributions of 19.91 % and 17.44 %, respectively. Unknown sources accounted for 10.83 % of the total PTE content. Thus, the proposed model provides innovative perspectives on source parsing. These findings provide valuable scientific insights for policymakers seeking to develop effective environmental protection measures and improve the quality of saline-alkali land in the Yellow River Delta.
Collapse
Affiliation(s)
- Mengge Zhou
- Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yonghua Li
- Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
24
|
Najm M, Cornet M, Albergante L, Zinovyev A, Sermet-Gaudelus I, Stoven V, Calzone L, Martignetti L. Representation and quantification of module activity from omics data with rROMA. NPJ Syst Biol Appl 2024; 10:8. [PMID: 38242871 PMCID: PMC10799004 DOI: 10.1038/s41540-024-00331-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/03/2024] [Indexed: 01/21/2024] Open
Abstract
The efficiency of analyzing high-throughput data in systems biology has been demonstrated in numerous studies, where molecular data, such as transcriptomics and proteomics, offers great opportunities for understanding the complexity of biological processes. One important aspect of data analysis in systems biology is the shift from a reductionist approach that focuses on individual components to a more integrative perspective that considers the system as a whole, where the emphasis shifted from differential expression of individual genes to determining the activity of gene sets. Here, we present the rROMA software package for fast and accurate computation of the activity of gene sets with coordinated expression. The rROMA package incorporates significant improvements in the calculation algorithm, along with the implementation of several functions for statistical analysis and visualizing results. These additions greatly expand the package's capabilities and offer valuable tools for data analysis and interpretation. It is an open-source package available on github at: www.github.com/sysbio-curie/rROMA . Based on publicly available transcriptomic datasets, we applied rROMA to cystic fibrosis, highlighting biological mechanisms potentially involved in the establishment and progression of the disease and the associated genes. Results indicate that rROMA can detect disease-related active signaling pathways using transcriptomic and proteomic data. The results notably identified a significant mechanism relevant to cystic fibrosis, raised awareness of a possible bias related to cell culture, and uncovered an intriguing gene that warrants further investigation.
Collapse
Affiliation(s)
- Matthieu Najm
- INSERM U900, 75428, Paris, France
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France
- Institut Curie, PSL Research University, 75248, Paris, France
| | - Matthieu Cornet
- INSERM U900, 75428, Paris, France
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France
- Institut Curie, PSL Research University, 75248, Paris, France
| | - Luca Albergante
- INSERM U900, 75428, Paris, France
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France
- Institut Curie, PSL Research University, 75248, Paris, France
| | - Andrei Zinovyev
- INSERM U900, 75428, Paris, France
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France
- Institut Curie, PSL Research University, 75248, Paris, France
| | - Isabelle Sermet-Gaudelus
- Faculté de Médecine, Université de Paris, Paris, France
- Institut Necker Enfants Malades, INSERM U1151, Paris, France
- AP-HP. Centre - Université Paris Cité; Hôpital Necker Enfants Malades, Centre de Référence Maladie Rare - Mucoviscidose, Paris, France
| | - Véronique Stoven
- INSERM U900, 75428, Paris, France
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France
- Institut Curie, PSL Research University, 75248, Paris, France
| | - Laurence Calzone
- INSERM U900, 75428, Paris, France
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France
- Institut Curie, PSL Research University, 75248, Paris, France
| | - Loredana Martignetti
- INSERM U900, 75428, Paris, France.
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006, Paris, France.
- Institut Curie, PSL Research University, 75248, Paris, France.
| |
Collapse
|
25
|
Yekelchyk M, Li X, Guenther S, Braun T. Single-Nucleus ATAC-seq for Mapping Chromatin Accessibility in Individual Cells of Murine Hearts. Methods Mol Biol 2024; 2752:245-257. [PMID: 38194039 DOI: 10.1007/978-1-0716-3621-3_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
During the last decade a wide range of single-cell and single-nucleus next-generation sequencing techniques have been developed, which revolutionized detection of rare cell populations, enabling creation of comprehensive cell atlases of complex organs and tissues. State-of-the-art methods do not only allow classical transcriptomics of individual cells but also comprise a number of epigenetic approaches, including assessment of chromatin accessibility by single-nucleus Assay for Transposase Accessible Chromatin ATAC-seq (snATAC-seq). The snATAC-seq assay detects "open chromatin," a term for low nucleosome occupancy of genomic regions, which is a prerequisite for effective transcription factor binding. Information about open chromatin at the single-nucleus level helps to recognize epigenetic changes, sometimes before transcription of respective genes occurs. snATAC-seq detects cellular heterogeneity in otherwise still transcriptionally and/or morphologically homogeneous cell populations. Chromatin accessibility assays may be used to detect epigenetic changes in cardiac lineages during heart development, chromatin landscape changes during aging, and epigenetic alterations in heart diseases. Here, we provide an optimized protocol for snATAC-seq of murine hearts. We describe isolation of single nuclei from snap-frozen hearts, provide hints for preparation of libraries suitable for snATAC-seq next-generation sequencing (NGS) using the Chromium 10× platform, and give general recommendations for downstream analysis using conventional bioinformatic pipelines and packages. The protocol should serve as a beginner's guide to generate high-quality snATAC-seq datasets and to perform chromatin accessibility analysis of individual heart-derived cell nuclei.
Collapse
Affiliation(s)
- Michail Yekelchyk
- Department of Cardiac Development and Remodelling, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Xiang Li
- Department of Cardiac Development and Remodelling, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Stefan Guenther
- Department of Cardiac Development and Remodelling, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Thomas Braun
- Department of Cardiac Development and Remodelling, Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.
- German Centre for Cardiovascular Research (DZHK), Partner site Rhein-Main, Frankfurt am Main, Germany.
| |
Collapse
|
26
|
Li Z, Meisner J, Albrechtsen A. Fast and accurate out-of-core PCA framework for large scale biobank data. Genome Res 2023; 33:1599-1608. [PMID: 37620119 PMCID: PMC10620046 DOI: 10.1101/gr.277525.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 08/18/2023] [Indexed: 08/26/2023]
Abstract
Principal component analysis (PCA) is widely used in statistics, machine learning, and genomics for dimensionality reduction and uncovering low-dimensional latent structure. To address the challenges posed by ever-growing data size, fast and memory-efficient PCA methods have gained prominence. In this paper, we propose a novel randomized singular value decomposition (RSVD) algorithm implemented in PCAone, featuring a window-based optimization scheme that enables accelerated convergence while improving the accuracy. Additionally, PCAone incorporates out-of-core and multithreaded implementations for the existing Implicitly Restarted Arnoldi Method (IRAM) and RSVD. Through comprehensive evaluations using multiple large-scale real-world data sets in different fields, we show the advantage of PCAone over existing methods. The new algorithm achieves significantly faster computation time while maintaining accuracy comparable to the slower IRAM method. Notably, our analyses of UK Biobank, comprising around 0.5 million individuals and 6.1 million common single nucleotide polymorphisms, show that PCAone accurately computes the top 40 principal components within 9 h. This analysis effectively captures population structure, signals of selection, structural variants, and low recombination regions, utilizing <20 GB of memory and 20 CPU threads. Furthermore, when applied to single-cell RNA sequencing data featuring 1.3 million cells, PCAone, accurately capturing the top 40 principal components in 49 min. This performance represents a 10-fold improvement over state-of-the-art tools.
Collapse
Affiliation(s)
- Zilong Li
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200 København, Denmark;
| | - Jonas Meisner
- Biological and Precision Psychiatry, Mental Health Centre Copenhagen, Copenhagen University Hospital, 2100 København, Denmark
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 København, Denmark
| | - Anders Albrechtsen
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200 København, Denmark
| |
Collapse
|
27
|
Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. CELL REPORTS METHODS 2023; 3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.
Collapse
Affiliation(s)
- Ihuan Gunawan
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - John George Lock
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
- Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| |
Collapse
|
28
|
Orrapin S, Thongkumkoon P, Udomruk S, Moonmuang S, Sutthitthasakul S, Yongpitakwattana P, Pruksakorn D, Chaiyawat P. Deciphering the Biology of Circulating Tumor Cells through Single-Cell RNA Sequencing: Implications for Precision Medicine in Cancer. Int J Mol Sci 2023; 24:12337. [PMID: 37569711 PMCID: PMC10418766 DOI: 10.3390/ijms241512337] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 07/25/2023] [Accepted: 07/27/2023] [Indexed: 08/13/2023] Open
Abstract
Circulating tumor cells (CTCs) hold unique biological characteristics that directly involve them in hematogenous dissemination. Studying CTCs systematically is technically challenging due to their extreme rarity and heterogeneity and the lack of specific markers to specify metastasis-initiating CTCs. With cutting-edge technology, single-cell RNA sequencing (scRNA-seq) provides insights into the biology of metastatic processes driven by CTCs. Transcriptomics analysis of single CTCs can decipher tumor heterogeneity and phenotypic plasticity for exploring promising novel therapeutic targets. The integrated approach provides a perspective on the mechanisms underlying tumor development and interrogates CTCs interactions with other blood cell types, particularly those of the immune system. This review aims to comprehensively describe the current study on CTC transcriptomic analysis through scRNA-seq technology. We emphasize the workflow for scRNA-seq analysis of CTCs, including enrichment, single cell isolation, and bioinformatic tools applied for this purpose. Furthermore, we elucidated the translational knowledge from the transcriptomic profile of individual CTCs and the biology of cancer metastasis for developing effective therapeutics through targeting key pathways in CTCs.
Collapse
Affiliation(s)
- Santhasiri Orrapin
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand; (S.O.); (P.T.); (S.U.); (S.M.); (S.S.); (P.Y.); (D.P.)
| | - Patcharawadee Thongkumkoon
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand; (S.O.); (P.T.); (S.U.); (S.M.); (S.S.); (P.Y.); (D.P.)
| | - Sasimol Udomruk
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand; (S.O.); (P.T.); (S.U.); (S.M.); (S.S.); (P.Y.); (D.P.)
- Musculoskeletal Science and Translational Research (MSTR) Center, Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand
| | - Sutpirat Moonmuang
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand; (S.O.); (P.T.); (S.U.); (S.M.); (S.S.); (P.Y.); (D.P.)
| | - Songphon Sutthitthasakul
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand; (S.O.); (P.T.); (S.U.); (S.M.); (S.S.); (P.Y.); (D.P.)
| | - Petlada Yongpitakwattana
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand; (S.O.); (P.T.); (S.U.); (S.M.); (S.S.); (P.Y.); (D.P.)
| | - Dumnoensun Pruksakorn
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand; (S.O.); (P.T.); (S.U.); (S.M.); (S.S.); (P.Y.); (D.P.)
- Musculoskeletal Science and Translational Research (MSTR) Center, Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand
- Department of Orthopedics, Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand
| | - Parunya Chaiyawat
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand; (S.O.); (P.T.); (S.U.); (S.M.); (S.S.); (P.Y.); (D.P.)
- Musculoskeletal Science and Translational Research (MSTR) Center, Faculty of Medicine, Chiang Mai University, Muang, Chiang Mai 50200, Thailand
| |
Collapse
|
29
|
Ceglia N, Sethna Z, Freeman SS, Uhlitz F, Bojilova V, Rusk N, Burman B, Chow A, Salehi S, Kabeer F, Aparicio S, Greenbaum BD, Shah SP, McPherson A. Identification of transcriptional programs using dense vector representations defined by mutual information with GeneVector. Nat Commun 2023; 14:4400. [PMID: 37474509 PMCID: PMC10359421 DOI: 10.1038/s41467-023-39985-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 07/04/2023] [Indexed: 07/22/2023] Open
Abstract
Deciphering individual cell phenotypes from cell-specific transcriptional processes requires high dimensional single cell RNA sequencing. However, current dimensionality reduction methods aggregate sparse gene information across cells, without directly measuring the relationships that exist between genes. By performing dimensionality reduction with respect to gene co-expression, low-dimensional features can model these gene-specific relationships and leverage shared signal to overcome sparsity. We describe GeneVector, a scalable framework for dimensionality reduction implemented as a vector space model using mutual information between gene expression. Unlike other methods, including principal component analysis and variational autoencoders, GeneVector uses latent space arithmetic in a lower dimensional gene embedding to identify transcriptional programs and classify cell types. In this work, we show in four single cell RNA-seq datasets that GeneVector was able to capture phenotype-specific pathways, perform batch effect correction, interactively annotate cell types, and identify pathway variation with treatment over time.
Collapse
Affiliation(s)
- Nicholas Ceglia
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| | - Zachary Sethna
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Immuno-Oncology Service, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Hepatopancreatobiliary Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Samuel S Freeman
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Florian Uhlitz
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Viktoria Bojilova
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Nicole Rusk
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Bharat Burman
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Andrew Chow
- Department of Medicine, Thoracic Oncology Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Sohrab Salehi
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Farhia Kabeer
- Department of Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Samuel Aparicio
- Department of Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Benjamin D Greenbaum
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Physiology, Biophysics & Systems Biology, Weill Cornell Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Sohrab P Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Andrew McPherson
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
30
|
Li Y, Nguyen J, Anastasiu DC, Arriaga EA. CosTaL: an accurate and scalable graph-based clustering algorithm for high-dimensional single-cell data analysis. Brief Bioinform 2023; 24:bbad157. [PMID: 37150778 PMCID: PMC10199777 DOI: 10.1093/bib/bbad157] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 03/28/2023] [Accepted: 04/02/2023] [Indexed: 05/09/2023] Open
Abstract
With the aim of analyzing large-sized multidimensional single-cell datasets, we are describing a method for Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN) graph. The cells are represented by the vertices of the graph, while an edge between two vertices in the graph represents the close relatedness between the two cells. Specifically, CosTaL builds an exact kNN graph using cosine similarity and uses the Tanimoto coefficient as the refining strategy to re-weight the edges in order to improve the effectiveness of clustering. We demonstrate that CosTaL generally achieves equivalent or higher effectiveness scores on seven benchmark cytometry datasets and six single-cell RNA-sequencing datasets using six different evaluation metrics, compared with other state-of-the-art graph-based clustering methods, including PhenoGraph, Scanpy and PARC. As indicated by the combined evaluation metrics, Costal has high efficiency with small datasets and acceptable scalability for large datasets, which is beneficial for large-scale analysis.
Collapse
Affiliation(s)
- Yijia Li
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, 420 Washington Ave. S.E., Minneapolis, 55455, Minnesota, USA
| | - Jonathan Nguyen
- Department of Computer Science and Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, 95053, California, USA
| | - David C Anastasiu
- Department of Computer Science and Engineering, Santa Clara University, 500 El Camino Real, Santa Clara, 95053, California, USA
| | - Edgar A Arriaga
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, 420 Washington Ave. S.E., Minneapolis, 55455, Minnesota, USA
- Department of Chemistry, University of Minnesota, Smith Hall, 139 Smith Hall, Pleasant St SE, Minneapolis, 55455, Minnesota, USA
| |
Collapse
|
31
|
Gundogdu P, Alamo I, Nepomuceno-Chamorro IA, Dopazo J, Loucera C. SigPrimedNet: A Signaling-Informed Neural Network for scRNA-seq Annotation of Known and Unknown Cell Types. BIOLOGY 2023; 12:biology12040579. [PMID: 37106779 PMCID: PMC10135788 DOI: 10.3390/biology12040579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 03/04/2023] [Accepted: 04/08/2023] [Indexed: 04/29/2023]
Abstract
Single-cell RNA sequencing is increasing our understanding of the behavior of complex tissues or organs, by providing unprecedented details on the complex cell type landscape at the level of individual cells. Cell type definition and functional annotation are key steps to understanding the molecular processes behind the underlying cellular communication machinery. However, the exponential growth of scRNA-seq data has made the task of manually annotating cells unfeasible, due not only to an unparalleled resolution of the technology but to an ever-increasing heterogeneity of the data. Many supervised and unsupervised methods have been proposed to automatically annotate cells. Supervised approaches for cell-type annotation outperform unsupervised methods except when new (unknown) cell types are present. Here, we introduce SigPrimedNet an artificial neural network approach that leverages (i) efficient training by means of a sparsity-inducing signaling circuits-informed layer, (ii) feature representation learning through supervised training, and (iii) unknown cell-type identification by fitting an anomaly detection method on the learned representation. We show that SigPrimedNet can efficiently annotate known cell types while keeping a low false-positive rate for unseen cells across a set of publicly available datasets. In addition, the learned representation acts as a proxy for signaling circuit activity measurements, which provide useful estimations of the cell functionalities.
Collapse
Affiliation(s)
- Pelin Gundogdu
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
| | - Inmaculada Alamo
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
| | | | - Joaquin Dopazo
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
- Bioinformatics in Rare Diseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013 Sevilla, Spain
- FPS/ELIXIR-es, Hospital Virgen del Rocío, 42013 Sevilla, Spain
| | - Carlos Loucera
- Computational Medicine Platform, Andalusian Public Foundation Progress and Health-FPS, 41013 Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013 Sevilla, Spain
| |
Collapse
|
32
|
Knight CH, Khan F, Patel A, Gill US, Okosun J, Wang J. IBRAP: integrated benchmarking single-cell RNA-sequencing analytical pipeline. Brief Bioinform 2023; 24:bbad061. [PMID: 36847692 PMCID: PMC10025434 DOI: 10.1093/bib/bbad061] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 12/19/2022] [Accepted: 02/02/2023] [Indexed: 03/01/2023] Open
Abstract
Single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) is a powerful tool to study cellular heterogeneity. The high dimensional data generated from this technology are complex and require specialized expertise for analysis and interpretation. The core of scRNA-seq data analysis contains several key analytical steps, which include pre-processing, quality control, normalization, dimensionality reduction, integration and clustering. Each step often has many algorithms developed with varied underlying assumptions and implications. With such a diverse choice of tools available, benchmarking analyses have compared their performances and demonstrated that tools operate differentially according to the data types and complexity. Here, we present Integrated Benchmarking scRNA-seq Analytical Pipeline (IBRAP), which contains a suite of analytical components that can be interchanged throughout the pipeline alongside multiple benchmarking metrics that enable users to compare results and determine the optimal pipeline combinations for their data. We apply IBRAP to single- and multi-sample integration analysis using primary pancreatic tissue, cancer cell line and simulated data accompanied with ground truth cell labels, demonstrating the interchangeable and benchmarking functionality of IBRAP. Our results confirm that the optimal pipelines are dependent on individual samples and studies, further supporting the rationale and necessity of our tool. We then compare reference-based cell annotation with unsupervised analysis, both included in IBRAP, and demonstrate the superiority of the reference-based method in identifying robust major and minor cell types. Thus, IBRAP presents a valuable tool to integrate multiple samples and studies to create reference maps of normal and diseased tissues, facilitating novel biological discovery using the vast volume of scRNA-seq data available.
Collapse
Affiliation(s)
- Connor H Knight
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ
| | - Faraz Khan
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ
| | - Ankit Patel
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ
| | - Upkar S Gill
- Centre for Immunobiology, Blizard Institute, Faculty of Medicine and Dentistry Medicine & Dentistry, Queen Mary University of London, London E1 2AT, United Kingdom
| | - Jessica Okosun
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ
| | - Jun Wang
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ
| |
Collapse
|
33
|
Lotfollahi M, Rybakov S, Hrovatin K, Hediyeh-Zadeh S, Talavera-López C, Misharin AV, Theis FJ. Biologically informed deep learning to query gene programs in single-cell atlases. Nat Cell Biol 2023; 25:337-350. [PMID: 36732632 PMCID: PMC9928587 DOI: 10.1038/s41556-022-01072-x] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 12/08/2022] [Indexed: 02/04/2023]
Abstract
The increasing availability of large-scale single-cell atlases has enabled the detailed description of cell states. In parallel, advances in deep learning allow rapid analysis of newly generated query datasets by mapping them into reference atlases. However, existing data transformations learned to map query data are not easily explainable using biologically known concepts such as genes or pathways. Here we propose expiMap, a biologically informed deep-learning architecture that enables single-cell reference mapping. ExpiMap learns to map cells into biologically understandable components representing known 'gene programs'. The activity of each cell for a gene program is learned while simultaneously refining them and learning de novo programs. We show that expiMap compares favourably to existing methods while bringing an additional layer of interpretability to integrative single-cell analysis. Furthermore, we demonstrate its applicability to analyse single-cell perturbation responses in different tissues and species and resolve responses of patients who have coronavirus disease 2019 to different treatments across cell types.
Collapse
Affiliation(s)
- Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Sergei Rybakov
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Karin Hrovatin
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Soroor Hediyeh-Zadeh
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Bioinformatics Division, WEHI, Melbourne, Victoria, Australia
| | - Carlos Talavera-López
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Division of Infectious Diseases and Tropical Medicine, Ludwig-Maximilian-Universität Klinikum, Munich, Germany
| | - Alexander V Misharin
- Division of Pulmonary and Critical Care Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Wellcome Sanger Institute, Cambridge, UK.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
34
|
Pandey D, Onkara PP. Improved downstream functional analysis of single-cell RNA-sequence data using DGAN. Sci Rep 2023; 13:1618. [PMID: 36709340 PMCID: PMC9884242 DOI: 10.1038/s41598-023-28952-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/27/2023] [Indexed: 01/29/2023] Open
Abstract
The dramatic increase in the number of single-cell RNA-sequence (scRNA-seq) investigations is indeed an endorsement of the new-fangled proficiencies of next generation sequencing technologies that facilitate the accurate measurement of tens of thousands of RNA expression levels at the cellular resolution. Nevertheless, missing values of RNA amplification persist and remain as a significant computational challenge, as these data omission induce further noise in their respective cellular data and ultimately impede downstream functional analysis of scRNA-seq data. Consequently, it turns imperative to develop robust and efficient scRNA-seq data imputation methods for improved downstream functional analysis outcomes. To overcome this adversity, we have designed an imputation framework namely deep generative autoencoder network [DGAN]. In essence, DGAN is an evolved variational autoencoder designed to robustly impute data dropouts in scRNA-seq data manifested as a sparse gene expression matrix. DGAN principally reckons count distribution, besides data sparsity utilizing a gaussian model whereby, cell dependencies are capitalized to detect and exclude outlier cells via imputation. When tested on five publicly available scRNA-seq data, DGAN outperformed every single baseline method paralleled, with respect to downstream functional analysis including cell data visualization, clustering, classification and differential expression analysis. DGAN is executed in Python and is accessible at https://github.com/dikshap11/DGAN .
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, India
| | - Perumal P Onkara
- Department of Biotechnology, National Institute of Technology, Warangal, India.
| |
Collapse
|
35
|
Mirkes EM, Bac J, Fouché A, Stasenko SV, Zinovyev A, Gorban AN. Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data. ENTROPY (BASEL, SWITZERLAND) 2022; 25:33. [PMID: 36673174 PMCID: PMC9858254 DOI: 10.3390/e25010033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/18/2022] [Accepted: 12/21/2022] [Indexed: 06/17/2023]
Abstract
Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.
Collapse
Affiliation(s)
- Evgeny M. Mirkes
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| | - Jonathan Bac
- Institut Curie, PSL Research University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, 75012 Paris, France
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Aziz Fouché
- Institut Curie, PSL Research University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, 75012 Paris, France
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Sergey V. Stasenko
- Laboratory of Advanced Methods for High-Dimensional Data Analysis, Lobachevsky University, 603000 Nizhniy Novgorod, Russia
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, 75005 Paris, France
- Institut National de la Santé et de la Recherche Médicale (INSERM), U900, 75012 Paris, France
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France
| | - Alexander N. Gorban
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
| |
Collapse
|
36
|
Su M, Pan T, Chen QZ, Zhou WW, Gong Y, Xu G, Yan HY, Li S, Shi QZ, Zhang Y, He X, Jiang CJ, Fan SC, Li X, Cairns MJ, Wang X, Li YS. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Mil Med Res 2022; 9:68. [PMID: 36461064 PMCID: PMC9716519 DOI: 10.1186/s40779-022-00434-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
The application of single-cell RNA sequencing (scRNA-seq) in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies. With the expansion of capacity for high-throughput scRNA-seq, including clinical samples, the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field. Here, we review the workflow for typical scRNA-seq data analysis, covering raw data processing and quality control, basic data analysis applicable for almost all scRNA-seq data sets, and advanced data analysis that should be tailored to specific scientific questions. While summarizing the current methods for each analysis step, we also provide an online repository of software and wrapped-up scripts to support the implementation. Recommendations and caveats are pointed out for some specific analysis tasks and approaches. We hope this resource will be helpful to researchers engaging with scRNA-seq, in particular for emerging clinical applications.
Collapse
Affiliation(s)
- Min Su
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Tao Pan
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Qiu-Zhen Chen
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Wei-Wei Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081 Heilongjiang China
| | - Yi Gong
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
- Department of Immunology, Nanjing Medical University, Nanjing, 211166 China
| | - Gang Xu
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Huan-Yu Yan
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Si Li
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Qiao-Zhen Shi
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Ya Zhang
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| | - Xiao He
- Department of Laboratory Medicine, Women and Children’s Hospital of Chongqing Medical University, Chongqing, 401174 China
| | | | - Shi-Cai Fan
- Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110 Guangdong China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081 Heilongjiang China
| | - Murray J. Cairns
- School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, the University of Newcastle, University Drive, Callaghan, NSW 2308 Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW 2305 Australia
| | - Xi Wang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166 China
| | - Yong-Sheng Li
- College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199 Hainan China
| |
Collapse
|
37
|
Watson ER, Mora A, Taherian Fard A, Mar JC. How does the structure of data impact cell-cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data. Brief Bioinform 2022; 23:bbac387. [PMID: 36151725 PMCID: PMC9677483 DOI: 10.1093/bib/bbac387] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/26/2022] [Accepted: 08/11/2022] [Indexed: 12/14/2022] Open
Abstract
Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the 'best-performing' metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.
Collapse
Affiliation(s)
- Ebony Rose Watson
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Ariane Mora
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
38
|
Lee H, Han B. FastRNA: An efficient solution for PCA of single-cell RNA-sequencing data based on a batch-accounting count model. Am J Hum Genet 2022; 109:1974-1985. [PMID: 36206757 PMCID: PMC9674949 DOI: 10.1016/j.ajhg.2022.09.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 09/14/2022] [Indexed: 01/26/2023] Open
Abstract
Almost always, the analysis of single-cell RNA-sequencing (scRNA-seq) data begins with the generation of the low dimensional embedding of the data by principal-component analysis (PCA). Because scRNA-seq data are count data, log transformation is routinely applied to correct skewness prior to PCA, which is often argued to have added bias to data. Alternatively, studies have proposed methods that directly assume a count model and use approximately normally distributed count residuals for PCA. Despite their theoretical advantage of directly modeling count data, these methods are extremely slow for large datasets. In fact, when the data size grows, even the standard log normalization becomes inefficient. Here, we present FastRNA, a highly efficient solution for PCA of scRNA-seq data based on a count model accounting for both batches and cell size factors. Although we assume the same general count model as previous methods, our method uses two orders of magnitude less time and memory than the other count-based methods and an order of magnitude less time and memory than the standard log normalization. This achievement results from our unique algebraic optimization that completely avoids the formation of the large dense residual matrix in memory. In addition, our method enjoys a benefit that the batch effects are eliminated from data prior to PCA. Generating a batch-accounted PC of an atlas-scale dataset with 2 million cells takes less than a minute and 1 GB memory with our method.
Collapse
Affiliation(s)
- Hanbin Lee
- Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea.
| | - Buhm Han
- Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea; Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea; Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea; Genealogy Inc., Seoul, Republic of Korea.
| |
Collapse
|
39
|
Cuevas-Diaz Duran R, González-Orozco JC, Velasco I, Wu JQ. Single-cell and single-nuclei RNA sequencing as powerful tools to decipher cellular heterogeneity and dysregulation in neurodegenerative diseases. Front Cell Dev Biol 2022; 10:884748. [PMID: 36353512 PMCID: PMC9637968 DOI: 10.3389/fcell.2022.884748] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 10/06/2022] [Indexed: 08/10/2023] Open
Abstract
Neurodegenerative diseases affect millions of people worldwide and there are currently no cures. Two types of common neurodegenerative diseases are Alzheimer's (AD) and Parkinson's disease (PD). Single-cell and single-nuclei RNA sequencing (scRNA-seq and snRNA-seq) have become powerful tools to elucidate the inherent complexity and dynamics of the central nervous system at cellular resolution. This technology has allowed the identification of cell types and states, providing new insights into cellular susceptibilities and molecular mechanisms underlying neurodegenerative conditions. Exciting research using high throughput scRNA-seq and snRNA-seq technologies to study AD and PD is emerging. Herein we review the recent progress in understanding these neurodegenerative diseases using these state-of-the-art technologies. We discuss the fundamental principles and implications of single-cell sequencing of the human brain. Moreover, we review some examples of the computational and analytical tools required to interpret the extensive amount of data generated from these assays. We conclude by highlighting challenges and limitations in the application of these technologies in the study of AD and PD.
Collapse
Affiliation(s)
| | | | - Iván Velasco
- Instituto de Fisiología Celular—Neurociencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Laboratorio de Reprogramación Celular, Instituto Nacional de Neurología y Neurocirugía “Manuel Velasco Suárez”, Mexico City, Mexico
| | - Jia Qian Wu
- The Vivian L. Smith Department of Neurosurgery, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, United States
- Center for Stem Cell and Regenerative Medicine, UT Brown Foundation Institute of Molecular Medicine, Houston, TX, United States
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, United States
| |
Collapse
|
40
|
Sardoo AM, Zhang S, Ferraro TN, Keck TM, Chen Y. Decoding brain memory formation by single-cell RNA sequencing. Brief Bioinform 2022; 23:6713514. [PMID: 36156112 PMCID: PMC9677489 DOI: 10.1093/bib/bbac412] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/10/2022] [Accepted: 08/25/2022] [Indexed: 12/14/2022] Open
Abstract
To understand how distinct memories are formed and stored in the brain is an important and fundamental question in neuroscience and computational biology. A population of neurons, termed engram cells, represents the physiological manifestation of a specific memory trace and is characterized by dynamic changes in gene expression, which in turn alters the synaptic connectivity and excitability of these cells. Recent applications of single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) are promising approaches for delineating the dynamic expression profiles in these subsets of neurons, and thus understanding memory-specific genes, their combinatorial patterns and regulatory networks. The aim of this article is to review and discuss the experimental and computational procedures of sc/snRNA-seq, new studies of molecular mechanisms of memory aided by sc/snRNA-seq in human brain diseases and related mouse models, and computational challenges in understanding the regulatory mechanisms underlying long-term memory formation.
Collapse
Affiliation(s)
- Atlas M Sardoo
- Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Thomas N Ferraro
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, NJ 08103, USA
| | - Thomas M Keck
- Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA,Department of Chemistry & Biochemistry, Rowan University, Glassboro, NJ 08028, USA
| | - Yong Chen
- Corresponding author. Yong Chen, Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA. Tel.: +1 856 256 4500; E-mail:
| |
Collapse
|
41
|
Points of Significance: Principal Component Analysis for Biocentric Data Visualization. BIONANOSCIENCE 2022. [DOI: 10.1007/s12668-022-01021-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
42
|
Wang R, Peng G, Tam PPL, Jing N. Integration of computational analysis and spatial transcriptomics in single-cell study. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00084-5. [PMID: 35901961 PMCID: PMC10372908 DOI: 10.1016/j.gpb.2022.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 06/08/2022] [Accepted: 06/19/2022] [Indexed: 04/08/2023]
Abstract
Recent advances of single-cell transcriptomics technologies and allied computational methodologies have revolutionized molecular cell biology. Meanwhile, pioneering explorations in spatial transcriptomics have opened avenues to address fundamental biological questions in health and diseases. Here, we review the technical attributes of single-cell RNA sequencing and spatial transcriptomics, and the core concepts of computational data analysis. We further highlight the challenges in the application of data integration methodologies and the interpretation of the biological context of the findings.
Collapse
Affiliation(s)
- Ran Wang
- State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Guangdun Peng
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China
| | - Patrick P L Tam
- Embryology Research Unit, Children's Medical Research Institute, University of Sydney, Sydney, NSW 2145, Australia; School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW 2145, Australia
| | - Naihe Jing
- State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China; Guangzhou Laboratory, Guangzhou 510005, China; CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; Institute for Stem Cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
43
|
Obayashi T, Hibara H, Kagaya Y, Aoki Y, Kinoshita K. ATTED-II v11: A Plant Gene Coexpression Database Using a Sample Balancing Technique by Subagging of Principal Components. PLANT & CELL PHYSIOLOGY 2022; 63:869-881. [PMID: 35353884 DOI: 10.1093/pcp/pcac041] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 02/06/2022] [Accepted: 03/29/2022] [Indexed: 05/25/2023]
Abstract
ATTED-II (https://atted.jp) is a gene coexpression database for nine plant species based on publicly available RNAseq and microarray data. One of the challenges in constructing condition-independent coexpression data based on publicly available gene expression data is managing the inherent sampling bias. Here, we report ATTED-II version 11, wherein we adopted a coexpression calculation methodology to balance the samples using principal component analysis and ensemble calculation. This approach has two advantages. First, omitting principal components with low contribution rates reduces the main contributors of noise. Second, balancing large differences in contribution rates enables considering various sample conditions entirely. In addition, based on RNAseq- and microarray-based coexpression data, we provide species-representative, integrated coexpression information to enhance the efficiency of interspecies comparison of the coexpression data. These coexpression data are provided as a standardized z-score to facilitate integrated analysis with different data sources. We believe that with these improvements, ATTED-II is more valuable and powerful for supporting interspecies comparative studies and integrated analyses using heterogeneous data.
Collapse
Affiliation(s)
- Takeshi Obayashi
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Himiko Hibara
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Yuki Kagaya
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
| | - Yuichi Aoki
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573 Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, 6-3-09, Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8679 Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573 Japan
- Institute of Development, Aging, and Cancer, Tohoku University, 4-1 Seiryo-machi, Aoba-ku, Sendai, 980-8575 Japan
| |
Collapse
|
44
|
Zandavi SM, Koch FC, Vijayan A, Zanini F, Mora F, Ortega D, Vafaee F. Disentangling single-cell omics representation with a power spectral density-based feature extraction. Nucleic Acids Res 2022; 50:5482-5492. [PMID: 35639509 PMCID: PMC9178020 DOI: 10.1093/nar/gkac436] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 04/26/2022] [Accepted: 05/10/2022] [Indexed: 12/13/2022] Open
Abstract
Emerging single-cell technologies provide high-resolution measurements of distinct cellular modalities opening new avenues for generating detailed cellular atlases of many and diverse tissues. The high dimensionality, sparsity, and inaccuracy of single cell sequencing measurements, however, can obscure discriminatory information, mask cellular subtype variations and complicate downstream analyses which can limit our understanding of cell function and tissue heterogeneity. Here, we present a novel pre-processing method (scPSD) inspired by power spectral density analysis that enhances the accuracy for cell subtype separation from large-scale single-cell omics data. We comprehensively benchmarked our method on a wide range of single-cell RNA-sequencing datasets and showed that scPSD pre-processing, while being fast and scalable, significantly reduces data complexity, enhances cell-type separation, and enables rare cell identification. Additionally, we applied scPSD to transcriptomics and chromatin accessibility cell atlases and demonstrated its capacity to discriminate over 100 cell types across the whole organism and across different modalities of single-cell omics data.
Collapse
Affiliation(s)
- Seid Miad Zandavi
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Australia
- Programs in Metabolism and Medical & Population Genetics, Broad Institute, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Forrest C Koch
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Australia
| | - Abhishek Vijayan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Australia
| | - Fabio Zanini
- Prince of Wales Clinical School, UNSW Sydney, Australia
- Cellular Genomics Future Institute, UNSW Sydney, Australia
| | - Fatima Valdes Mora
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Australia
- School of Women's and Children's Health, Faculty of Medicine, UNSW, Sydney, Australia
| | - David Gallego Ortega
- School of Biomedical Engineering, University of Technology Sydney (UTS), Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW Sydney), Australia
- Cellular Genomics Future Institute, UNSW Sydney, Australia
- UNSW Data Science Hub (uDASH), UNSW Sydney, Australia
| |
Collapse
|
45
|
Kim C, Lee H, Jeong J, Jung K, Han B. MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering. Nucleic Acids Res 2022; 50:e71. [PMID: 35420135 PMCID: PMC9262626 DOI: 10.1093/nar/gkac216] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 03/16/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file.
Collapse
Affiliation(s)
- Chanwoo Kim
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Hanbin Lee
- Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Juhee Jeong
- Department of Biomedical Sciences, BK21 Plus Biomedical Science Project, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Keehoon Jung
- Department of Biomedical Sciences, BK21 Plus Biomedical Science Project, Seoul National University College of Medicine, Seoul, Republic of Korea.,Department of Anatomy and Cell Biology, Seoul National University College of Medicine, Seoul, Republic of Korea.,Institute of Allergy and Clinical Immunology, Seoul National University Medical Research Center, Seoul, Republic of Korea
| | - Buhm Han
- Department of Biomedical Sciences, BK21 Plus Biomedical Science Project, Seoul National University College of Medicine, Seoul, Republic of Korea.,Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
46
|
Wang Y, Zhao H. Non-linear archetypal analysis of single-cell RNA-seq data by deep autoencoders. PLoS Comput Biol 2022; 18:e1010025. [PMID: 35363784 PMCID: PMC9007392 DOI: 10.1371/journal.pcbi.1010025] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 04/13/2022] [Accepted: 03/15/2022] [Indexed: 12/25/2022] Open
Abstract
Advances in single-cell RNA sequencing (scRNA-seq) have led to successes in discovering novel cell types and understanding cellular heterogeneity among complex cell populations through cluster analysis. However, cluster analysis is not able to reveal continuous spectrum of states and underlying gene expression programs (GEPs) shared across cell types. We introduce scAAnet, an autoencoder for single-cell non-linear archetypal analysis, to identify GEPs and infer the relative activity of each GEP across cells. We use a count distribution-based loss term to account for the sparsity and overdispersion of the raw count data and add an archetypal constraint to the loss function of scAAnet. We first show that scAAnet outperforms existing methods for archetypal analysis across different metrics through simulations. We then demonstrate the ability of scAAnet to extract biologically meaningful GEPs using publicly available scRNA-seq datasets including a pancreatic islet dataset, a lung idiopathic pulmonary fibrosis dataset and a prefrontal cortex dataset.
Collapse
Affiliation(s)
- Yuge Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
47
|
Liu B, Li Y, Zhang L. Analysis and Visualization of Spatial Transcriptomic Data. Front Genet 2022; 12:785290. [PMID: 35154244 PMCID: PMC8829434 DOI: 10.3389/fgene.2021.785290] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 12/24/2021] [Indexed: 12/21/2022] Open
Abstract
Human and animal tissues consist of heterogeneous cell types that organize and interact in highly structured manners. Bulk and single-cell sequencing technologies remove cells from their original microenvironments, resulting in a loss of spatial information. Spatial transcriptomics is a recent technological innovation that measures transcriptomic information while preserving spatial information. Spatial transcriptomic data can be generated in several ways. RNA molecules are measured by in situ sequencing, in situ hybridization, or spatial barcoding to recover original spatial coordinates. The inclusion of spatial information expands the range of possibilities for analysis and visualization, and spurred the development of numerous novel methods. In this review, we summarize the core concepts of spatial genomics technology and provide a comprehensive review of current analysis and visualization methods for spatial transcriptomics.
Collapse
|
48
|
Nowling RJ, Fallas-Moya F, Sadovnik A, Emrich S, Aleck M, Leskiewicz D, Peters JG. Fast, low-memory detection and localization of large, polymorphic inversions from SNPs. PeerJ 2022; 10:e12831. [PMID: 35116204 PMCID: PMC8784018 DOI: 10.7717/peerj.12831] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 01/04/2022] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Large (>1 Mb), polymorphic inversions have substantial impacts on population structure and maintenance of genotypes. These large inversions can be detected from single nucleotide polymorphism (SNP) data using unsupervised learning techniques like PCA. Construction and analysis of a feature matrix from millions of SNPs requires large amount of memory and limits the sizes of data sets that can be analyzed. METHODS We propose using feature hashing construct a feature matrix from a VCF file of SNPs for reducing memory usage. The matrix is constructed in a streaming fashion such that the entire VCF file is never loaded into memory at one time. RESULTS When evaluated on Anopheles mosquito and Drosophila fly data sets, our approach reduced memory usage by 97% with minimal reductions in accuracy for inversion detection and localization tasks. CONCLUSION With these changes, inversions in larger data sets can be analyzed easily and efficiently on common laptop and desktop computers. Our method is publicly available through our open-source inversion analysis software, Asaph.
Collapse
Affiliation(s)
- Ronald J. Nowling
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, Wisconsin, United States of America
| | - Fabian Fallas-Moya
- Electrical Engineering and Computer Science, University of Tennessee-Knoxville, Knoxville, Tennessee, United States
| | - Amir Sadovnik
- Electrical Engineering and Computer Science, University of Tennessee-Knoxville, Knoxville, Tennessee, United States
| | - Scott Emrich
- Electrical Engineering and Computer Science, University of Tennessee-Knoxville, Knoxville, Tennessee, United States
| | - Matthew Aleck
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, Wisconsin, United States of America
| | - Daniel Leskiewicz
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, Wisconsin, United States of America
| | - John G. Peters
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, Wisconsin, United States of America
| |
Collapse
|
49
|
Gundogdu P, Loucera C, Alamo-Alvarez I, Dopazo J, Nepomuceno I. Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data. BioData Min 2022; 15:1. [PMID: 34980200 PMCID: PMC8722116 DOI: 10.1186/s13040-021-00285-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 12/04/2021] [Indexed: 11/13/2022] Open
Abstract
Background Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data. Results In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets. Conclusions Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00285-4.
Collapse
Affiliation(s)
- Pelin Gundogdu
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Carlos Loucera
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Inmaculada Alamo-Alvarez
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain.,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain
| | - Joaquin Dopazo
- Clinical Bioinformatics Area. Fundación Progreso y Salud (FPS). CDCA, Hospital Virgen del Rocio, 41013, Sevilla, Spain. .,Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Hospital Virgen del Rocio, 41013, Sevilla, Spain. .,Bioinformatics in Rare Diseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, 41013, Sevilla, Spain. .,FPS/ELIXIR-es, Hospital Virgen del Rocío, 42013, Sevilla, Spain.
| | - Isabel Nepomuceno
- Department of Computer Languages and Systems, Universidad de Sevilla, Sevilla, Spain.
| |
Collapse
|
50
|
Chau MJ, Quintero JE, Monje PV, Voss SR, Welleford AS, Gerhardt GA, van Horne CG. Using a Transection Paradigm to Enhance the Repair Mechanisms of an Investigational Human Cell Therapy. Cell Transplant 2022; 31:9636897221123515. [PMID: 36169034 PMCID: PMC9523845 DOI: 10.1177/09636897221123515] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 08/16/2022] [Indexed: 12/02/2022] Open
Abstract
One promising strategy in cell therapies for Parkinson's disease (PD) is to harness a patient's own cells to provide neuroprotection in areas of the brain affected by neurodegeneration. No treatment exists to replace cells in the brain. Thus, our goal has been to support sick neurons and slow neurodegeneration by transplanting living repair tissue from the peripheral nervous system into the substantia nigra of those with PD. Our group has pioneered the transplantation of transection-activated sural nerve fascicles into the brain of human subjects with PD. Our experience in sural nerve transplantation has supported the safety and feasibility of this approach. As part of a paradigm to assess the reparative properties of human sural nerve following a transection injury, we collected nerve tissue approximately 2 weeks after sural nerve transection for immunoassays from 15 participants, and collected samples from two additional participants for single nuclei RNA sequencing. We quantified the expression of key neuroprotective and select anti-apoptotic genes along with their corresponding protein levels using immunoassays. The single nuclei data clustered into 10 distinctive groups defined on the basis of previously published cell type-specific genes. Transection-induced reparative peripheral nerve tissue showed RNA expression of neuroprotective factors and anti-apoptotic factors across multiple cell types after nerve injury induction. Key proteins of interest (BDNF, GDNF, beta-NGF, PDGFB, and VEGF) were upregulated in reparative tissue. These results provide insight on this repair tissue's utility as a neuroprotective cell therapy.
Collapse
Affiliation(s)
- Monica J. Chau
- Brain Restoration Center, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Neurosurgery, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Jorge E. Quintero
- Brain Restoration Center, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Neurosurgery, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Paula V. Monje
- Stark Neurosciences Research Institute, Department of Neurological Surgery, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Stephen Randal Voss
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Andrew S. Welleford
- Department of Neurology, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Greg A. Gerhardt
- Brain Restoration Center, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Neurosurgery, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Neurology, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Craig G. van Horne
- Brain Restoration Center, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Neurosurgery, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY, USA
| |
Collapse
|