1
|
Xiao Y, Jin W, Qian K, Ju L, Wang G, Wu K, Cao R, Chang L, Xu Z, Luo J, Shan L, Yu F, Chen X, Liu D, Cao H, Wang Y, Cao X, Zhou W, Cui D, Tian Y, Ji C, Luo Y, Hong X, Chen F, Peng M, Zhang Y, Wang X. Integrative Single Cell Atlas Revealed Intratumoral Heterogeneity Generation from an Adaptive Epigenetic Cell State in Human Bladder Urothelial Carcinoma. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2308438. [PMID: 38582099 DOI: 10.1002/advs.202308438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/22/2024] [Indexed: 04/08/2024]
Abstract
Intratumor heterogeneity (ITH) of bladder cancer (BLCA) contributes to therapy resistance and immune evasion affecting clinical prognosis. The molecular and cellular mechanisms contributing to BLCA ITH generation remain elusive. It is found that a TM4SF1-positive cancer subpopulation (TPCS) can generate ITH in BLCA, evidenced by integrative single cell atlas analysis. Extensive profiling of the epigenome and transcriptome of all stages of BLCA revealed their evolutionary trajectories. Distinct ancestor cells gave rise to low-grade noninvasive and high-grade invasive BLCA. Epigenome reprograming led to transcriptional heterogeneity in BLCA. During early oncogenesis, epithelial-to-mesenchymal transition generated TPCS. TPCS has stem-cell-like properties and exhibited transcriptional plasticity, priming the development of transcriptionally heterogeneous descendent cell lineages. Moreover, TPCS prevalence in tumor is associated with advanced stage cancer and poor prognosis. The results of this study suggested that bladder cancer interacts with its environment by acquiring a stem cell-like epigenomic landscape, which might generate ITH without additional genetic diversification.
Collapse
Affiliation(s)
- Yu Xiao
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Wan Jin
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
- Euler Technology, Beijing, 102206, China
| | - Kaiyu Qian
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Lingao Ju
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Gang Wang
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Kai Wu
- Euler Technology, Beijing, 102206, China
| | - Rui Cao
- Department of Urology, Beijing Friendship Hospital, Capital Medical University, Beijing, 100050, China
| | | | - Zilin Xu
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Jun Luo
- Department of Pathology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | | | - Fang Yu
- Department of Pathology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | | | | | - Hong Cao
- Department of Pathology, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Yejinpeng Wang
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Xinyue Cao
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
- Clinical Trial Center, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Wei Zhou
- Hubei Key Laboratory of Medical Technology on Transplantation, Institute of Hepatobiliary Diseases of Wuhan University, Transplant Center of Wuhan University, Wuhan, 430071, China
| | - Diansheng Cui
- Department of Urology, Hubei Cancer Hospital, Wuhan, 430079, China
| | - Ye Tian
- Department of Urology, Beijing Friendship Hospital, Capital Medical University, Beijing, 100050, China
| | - Chundong Ji
- Department of Urology, The Affiliated Hospital of Panzhihua University, Panzhihua, 617099, China
| | - Yongwen Luo
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Xin Hong
- Department of Urology, Peking University International Hospital, Beijing, 102206, China
| | - Fangjin Chen
- Center for Quantitative Biology, School of Life Sciences, Peking University, Beijing, 100091, China
| | - Minsheng Peng
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China
- Kunming College of Life Science, University of Academy of Sciences, Kunming, 650201, China
| | - Yi Zhang
- Euler Technology, Beijing, 102206, China
| | - Xinghuan Wang
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei Province, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
- Medical Research Institute, Wuhan University, Wuhan, 430071, China
| |
Collapse
|
2
|
Ahmad Amshi H, Prasad R, Sharma BK, Yusuf SI, Sani Z. How can machine learning predict cholera: insights from experiments and design science for action research. JOURNAL OF WATER AND HEALTH 2024; 22:21-35. [PMID: 38295070 PMCID: wh_2023_026 DOI: 10.2166/wh.2023.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
Cholera is a leading cause of mortality in Nigeria. The two most significant predictors of cholera are a lack of access to clean water and poor sanitary conditions. Other factors such as natural disasters, illiteracy, and internal conflicts that drive people to seek sanctuary in refugee camps may contribute to the spread of cholera in Nigeria. The aim of this research is to develop a cholera outbreak risk prediction (CORP) model using machine learning tools and data science. In this study, we developed a CORP model using design science perspectives and machine learning to detect cholera outbreaks in Nigeria. Nonnegative matrix factorization (NMF) was used for dimensionality reduction, and synthetic minority oversampling technique (SMOTE) was used for data balancing. Outliers were detected using density-based spatial clustering of applications with noise (DBSCAN) were removed improving the overall performance of the model, and the extreme-gradient boost algorithm was used for prediction. The findings revealed that the CORP model outcomes resulted in the best accuracy of 99.62%, Matthews's correlation coefficient of 0.976, and area under the curve of 99.2%, which were improved compared with the previous findings. The developed model can be helpful to healthcare providers in predicting possible cholera outbreaks.
Collapse
Affiliation(s)
- Hauwa Ahmad Amshi
- African University of Science and Technology, Abuja, Nigeria E-mail:
| | - Rajesh Prasad
- Department of Computer Science and Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, India
| | | | | | | |
Collapse
|
3
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023; 24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Statistics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
4
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.03.531029. [PMID: 36945441 PMCID: PMC10028846 DOI: 10.1101/2023.03.03.531029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Department of Statistics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
5
|
Liu Q, Wang D, Zhou L, Li J, Wang G. MTGDC: A Multi-Scale Tensor Graph Diffusion Clustering for Single-Cell RNA Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3056-3067. [PMID: 37418411 DOI: 10.1109/tcbb.2023.3293112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a new technology that focuses on the expression levels for each cell to study cell heterogeneity. Thus, new computational methods matching scRNA-seq are designed to detect cell types among various cell groups. Herein, we propose a Multi-scale Tensor Graph Diffusion Clustering (MTGDC) for single-cell RNA sequencing data. It has the following mechanisms: 1) To mine potential similarity distributions among cells, we design a multi-scale affinity learning method to construct a fully connected graph between cells; 2) For each affinity matrix, we propose an efficient tensor graph diffusion learning framework to learn high-order information among multi-scale affinity matrices. First, the tensor graph is explicitly introduced to measure cell-cell edges with local high-order relationship information. To further preserve more global topology structure information in the tensor graph, MTGDC implicitly considers the propagation of information via a data diffusion process by designing a simple and efficient tensor graph diffusion update algorithm. 3) Finally, we mix together the multi-scale tensor graphs to obtain the fusion high-order affinity matrix and apply it to spectral clustering. Experiments and case studies showed that MTGDC had obvious advantages over the state-of-art algorithms in robustness, accuracy, visualization, and speed.
Collapse
|
6
|
Qiu Y, Yan C, Zhao P, Zou Q. SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data. Brief Bioinform 2023; 24:7147025. [PMID: 37122068 DOI: 10.1093/bib/bbad149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/18/2023] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) technology attracts extensive attention in the biomedical field. It can be used to measure gene expression and analyze the transcriptome at the single-cell level, enabling the identification of cell types based on unsupervised clustering. Data imputation and dimension reduction are conducted before clustering because scRNA-seq has a high 'dropout' rate, noise and linear inseparability. However, independence of dimension reduction, imputation and clustering cannot fully characterize the pattern of the scRNA-seq data, resulting in poor clustering performance. Herein, we propose a novel and accurate algorithm, SSNMDI, that utilizes a joint learning approach to simultaneously perform imputation, dimensionality reduction and cell clustering in a non-negative matrix factorization (NMF) framework. In addition, we integrate the cell annotation as prior information, then transform the joint learning into a semi-supervised NMF model. Through experiments on 14 datasets, we demonstrate that SSNMDI has a faster convergence speed, better dimensionality reduction performance and a more accurate cell clustering performance than previous methods, providing an accurate and robust strategy for analyzing scRNA-seq data. Biological analysis are also conducted to validate the biological significance of our method, including pseudotime analysis, gene ontology and survival analysis. We believe that we are among the first to introduce imputation, partial label information, dimension reduction and clustering to the single-cell field. AVAILABILITY AND IMPLEMENTATION The source code for SSNMDI is available at https://github.com/yushanqiu/SSNMDI.
Collapse
Affiliation(s)
- Yushan Qiu
- College of Mathematics and Statistics, Shenzhen University, 518000, Guangdong, China
| | - Chang Yan
- College of Mathematics and Statistics, Shenzhen University, 518000, Guangdong, China
| | - Pu Zhao
- College of Life and Health Sciences, Northeastern University, Shenyang, 110169, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610056, China
| |
Collapse
|
7
|
CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03440-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
8
|
Song D, Li K, Hemminger Z, Wollman R, Li JJ. scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling. Bioinformatics 2021; 37:i358-i366. [PMID: 34252925 PMCID: PMC8275345 DOI: 10.1093/bioinformatics/btab273] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Results Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data. Availability and implementation The R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA 90095-7246, USA
| | - Kexin Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Zachary Hemminger
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA 90095, USA.,Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095-7239, USA
| | - Roy Wollman
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA 90095, USA.,Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095-7239, USA.,Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095-1569, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.,Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA.,Department of Computational Medicine, University of California, Los Angeles, CA 90095-1766, USA.,Department of Biostatistics, University of California Los Angeles, CA 90095-1772, USA
| |
Collapse
|