1
|
Bonev B, Castelo-Branco G, Chen F, Codeluppi S, Corces MR, Fan J, Heiman M, Harris K, Inoue F, Kellis M, Levine A, Lotfollahi M, Luo C, Maynard KR, Nitzan M, Ramani V, Satijia R, Schirmer L, Shen Y, Sun N, Green GS, Theis F, Wang X, Welch JD, Gokce O, Konopka G, Liddelow S, Macosko E, Ali Bayraktar O, Habib N, Nowakowski TJ. Opportunities and challenges of single-cell and spatially resolved genomics methods for neuroscience discovery. Nat Neurosci 2024; 27:2292-2309. [PMID: 39627587 PMCID: PMC11999325 DOI: 10.1038/s41593-024-01806-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 09/23/2024] [Indexed: 12/13/2024]
Abstract
Over the past decade, single-cell genomics technologies have allowed scalable profiling of cell-type-specific features, which has substantially increased our ability to study cellular diversity and transcriptional programs in heterogeneous tissues. Yet our understanding of mechanisms of gene regulation or the rules that govern interactions between cell types is still limited. The advent of new computational pipelines and technologies, such as single-cell epigenomics and spatially resolved transcriptomics, has created opportunities to explore two new axes of biological variation: cell-intrinsic regulation of cell states and expression programs and interactions between cells. Here, we summarize the most promising and robust technologies in these areas, discuss their strengths and limitations and discuss key computational approaches for analysis of these complex datasets. We highlight how data sharing and integration, documentation, visualization and benchmarking of results contribute to transparency, reproducibility, collaboration and democratization in neuroscience, and discuss needs and opportunities for future technology development and analysis.
Collapse
Affiliation(s)
- Boyan Bonev
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany
- Physiological Genomics, Biomedical Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gonçalo Castelo-Branco
- Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Fei Chen
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - M Ryan Corces
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Jean Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Myriam Heiman
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- The Picower Institute for Learning and Memory, MIT, Cambridge, MA, USA
| | - Kenneth Harris
- UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Fumitaka Inoue
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Manolis Kellis
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ariel Levine
- Spinal Circuits and Plasticity Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
| | - Mo Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Chongyuan Luo
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Mor Nitzan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
- Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel
- Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Vijay Ramani
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, San Francisco, CA, USA
| | - Rahul Satijia
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Lucas Schirmer
- Department of Neurology, Mannheim Center for Translational Neuroscience, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Yin Shen
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Na Sun
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gilad S Green
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Fabian Theis
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Xiao Wang
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joshua D Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Ozgun Gokce
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
- Department of Neurodegenerative Diseases and Geriatric Psychiatry, University Hospital Bonn, Bonn, Germany.
| | - Genevieve Konopka
- Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX, USA.
- Peter O'Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA.
| | - Shane Liddelow
- Neuroscience Institute, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Neuroscience & Physiology, NYU Grossman School of Medicine, New York, NY, USA.
- Parekh Center for Interdisciplinary Neurology, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Ophthalmology, NYU Grossman School of Medicine, New York, NY, USA.
| | - Evan Macosko
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA.
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA.
| | | | - Naomi Habib
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Tomasz J Nowakowski
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, USA.
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
2
|
Xiao Y, Jin W, Qian K, Ju L, Wang G, Wu K, Cao R, Chang L, Xu Z, Luo J, Shan L, Yu F, Chen X, Liu D, Cao H, Wang Y, Cao X, Zhou W, Cui D, Tian Y, Ji C, Luo Y, Hong X, Chen F, Peng M, Zhang Y, Wang X. Integrative Single Cell Atlas Revealed Intratumoral Heterogeneity Generation from an Adaptive Epigenetic Cell State in Human Bladder Urothelial Carcinoma. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2308438. [PMID: 38582099 PMCID: PMC11200000 DOI: 10.1002/advs.202308438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/22/2024] [Indexed: 04/08/2024]
Abstract
Intratumor heterogeneity (ITH) of bladder cancer (BLCA) contributes to therapy resistance and immune evasion affecting clinical prognosis. The molecular and cellular mechanisms contributing to BLCA ITH generation remain elusive. It is found that a TM4SF1-positive cancer subpopulation (TPCS) can generate ITH in BLCA, evidenced by integrative single cell atlas analysis. Extensive profiling of the epigenome and transcriptome of all stages of BLCA revealed their evolutionary trajectories. Distinct ancestor cells gave rise to low-grade noninvasive and high-grade invasive BLCA. Epigenome reprograming led to transcriptional heterogeneity in BLCA. During early oncogenesis, epithelial-to-mesenchymal transition generated TPCS. TPCS has stem-cell-like properties and exhibited transcriptional plasticity, priming the development of transcriptionally heterogeneous descendent cell lineages. Moreover, TPCS prevalence in tumor is associated with advanced stage cancer and poor prognosis. The results of this study suggested that bladder cancer interacts with its environment by acquiring a stem cell-like epigenomic landscape, which might generate ITH without additional genetic diversification.
Collapse
Affiliation(s)
- Yu Xiao
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Wan Jin
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
- Euler TechnologyBeijing102206China
| | - Kaiyu Qian
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Lingao Ju
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Gang Wang
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Kai Wu
- Euler TechnologyBeijing102206China
| | - Rui Cao
- Department of UrologyBeijing Friendship HospitalCapital Medical UniversityBeijing100050China
| | | | - Zilin Xu
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Jun Luo
- Department of PathologyZhongnan Hospital of Wuhan UniversityWuhan430071China
| | | | - Fang Yu
- Department of PathologyZhongnan Hospital of Wuhan UniversityWuhan430071China
| | | | | | - Hong Cao
- Department of PathologyZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Yejinpeng Wang
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Xinyue Cao
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
- Clinical Trial CenterZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Wei Zhou
- Hubei Key Laboratory of Medical Technology on TransplantationInstitute of Hepatobiliary Diseases of Wuhan University, Transplant Center of Wuhan UniversityWuhan430071China
| | - Diansheng Cui
- Department of UrologyHubei Cancer HospitalWuhan430079China
| | - Ye Tian
- Department of UrologyBeijing Friendship HospitalCapital Medical UniversityBeijing100050China
| | - Chundong Ji
- Department of UrologyThe Affiliated Hospital of Panzhihua UniversityPanzhihua617099China
| | - Yongwen Luo
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
| | - Xin Hong
- Department of UrologyPeking University International HospitalBeijing102206China
| | - Fangjin Chen
- Center for Quantitative BiologySchool of Life SciencesPeking UniversityBeijing100091China
| | - Minsheng Peng
- State Key Laboratory of Genetic Resources and EvolutionKunming Institute of ZoologyChinese Academy of SciencesKunming650201China
- Kunming College of Life ScienceUniversity of Academy of SciencesKunming650201China
| | - Yi Zhang
- Euler TechnologyBeijing102206China
| | - Xinghuan Wang
- Department of Urology, Hubei Key Laboratory of Urological Diseases, Department of Biological Repositories, Human Genetic Resources Preservation Center of Hubei ProvinceZhongnan Hospital of Wuhan UniversityWuhan430071China
- Medical Research InstituteWuhan UniversityWuhan430071China
| |
Collapse
|
3
|
Ahmad Amshi H, Prasad R, Sharma BK, Yusuf SI, Sani Z. How can machine learning predict cholera: insights from experiments and design science for action research. JOURNAL OF WATER AND HEALTH 2024; 22:21-35. [PMID: 38295070 PMCID: wh_2023_026 DOI: 10.2166/wh.2023.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
Cholera is a leading cause of mortality in Nigeria. The two most significant predictors of cholera are a lack of access to clean water and poor sanitary conditions. Other factors such as natural disasters, illiteracy, and internal conflicts that drive people to seek sanctuary in refugee camps may contribute to the spread of cholera in Nigeria. The aim of this research is to develop a cholera outbreak risk prediction (CORP) model using machine learning tools and data science. In this study, we developed a CORP model using design science perspectives and machine learning to detect cholera outbreaks in Nigeria. Nonnegative matrix factorization (NMF) was used for dimensionality reduction, and synthetic minority oversampling technique (SMOTE) was used for data balancing. Outliers were detected using density-based spatial clustering of applications with noise (DBSCAN) were removed improving the overall performance of the model, and the extreme-gradient boost algorithm was used for prediction. The findings revealed that the CORP model outcomes resulted in the best accuracy of 99.62%, Matthews's correlation coefficient of 0.976, and area under the curve of 99.2%, which were improved compared with the previous findings. The developed model can be helpful to healthcare providers in predicting possible cholera outbreaks.
Collapse
Affiliation(s)
- Hauwa Ahmad Amshi
- African University of Science and Technology, Abuja, Nigeria E-mail:
| | - Rajesh Prasad
- Department of Computer Science and Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, India
| | | | | | | |
Collapse
|
4
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023; 24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Statistics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
5
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.03.531029. [PMID: 36945441 PMCID: PMC10028846 DOI: 10.1101/2023.03.03.531029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Department of Statistics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
6
|
Liu Q, Wang D, Zhou L, Li J, Wang G. MTGDC: A Multi-Scale Tensor Graph Diffusion Clustering for Single-Cell RNA Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3056-3067. [PMID: 37418411 DOI: 10.1109/tcbb.2023.3293112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a new technology that focuses on the expression levels for each cell to study cell heterogeneity. Thus, new computational methods matching scRNA-seq are designed to detect cell types among various cell groups. Herein, we propose a Multi-scale Tensor Graph Diffusion Clustering (MTGDC) for single-cell RNA sequencing data. It has the following mechanisms: 1) To mine potential similarity distributions among cells, we design a multi-scale affinity learning method to construct a fully connected graph between cells; 2) For each affinity matrix, we propose an efficient tensor graph diffusion learning framework to learn high-order information among multi-scale affinity matrices. First, the tensor graph is explicitly introduced to measure cell-cell edges with local high-order relationship information. To further preserve more global topology structure information in the tensor graph, MTGDC implicitly considers the propagation of information via a data diffusion process by designing a simple and efficient tensor graph diffusion update algorithm. 3) Finally, we mix together the multi-scale tensor graphs to obtain the fusion high-order affinity matrix and apply it to spectral clustering. Experiments and case studies showed that MTGDC had obvious advantages over the state-of-art algorithms in robustness, accuracy, visualization, and speed.
Collapse
|
7
|
Qiu Y, Yan C, Zhao P, Zou Q. SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data. Brief Bioinform 2023; 24:7147025. [PMID: 37122068 DOI: 10.1093/bib/bbad149] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/18/2023] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) technology attracts extensive attention in the biomedical field. It can be used to measure gene expression and analyze the transcriptome at the single-cell level, enabling the identification of cell types based on unsupervised clustering. Data imputation and dimension reduction are conducted before clustering because scRNA-seq has a high 'dropout' rate, noise and linear inseparability. However, independence of dimension reduction, imputation and clustering cannot fully characterize the pattern of the scRNA-seq data, resulting in poor clustering performance. Herein, we propose a novel and accurate algorithm, SSNMDI, that utilizes a joint learning approach to simultaneously perform imputation, dimensionality reduction and cell clustering in a non-negative matrix factorization (NMF) framework. In addition, we integrate the cell annotation as prior information, then transform the joint learning into a semi-supervised NMF model. Through experiments on 14 datasets, we demonstrate that SSNMDI has a faster convergence speed, better dimensionality reduction performance and a more accurate cell clustering performance than previous methods, providing an accurate and robust strategy for analyzing scRNA-seq data. Biological analysis are also conducted to validate the biological significance of our method, including pseudotime analysis, gene ontology and survival analysis. We believe that we are among the first to introduce imputation, partial label information, dimension reduction and clustering to the single-cell field. AVAILABILITY AND IMPLEMENTATION The source code for SSNMDI is available at https://github.com/yushanqiu/SSNMDI.
Collapse
Affiliation(s)
- Yushan Qiu
- College of Mathematics and Statistics, Shenzhen University, 518000, Guangdong, China
| | - Chang Yan
- College of Mathematics and Statistics, Shenzhen University, 518000, Guangdong, China
| | - Pu Zhao
- College of Life and Health Sciences, Northeastern University, Shenyang, 110169, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610056, China
| |
Collapse
|
8
|
CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03440-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
9
|
Song D, Li K, Hemminger Z, Wollman R, Li JJ. scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling. Bioinformatics 2021; 37:i358-i366. [PMID: 34252925 PMCID: PMC8275345 DOI: 10.1093/bioinformatics/btab273] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Motivation Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Results Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data. Availability and implementation The R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongyuan Song
- Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, CA 90095-7246, USA
| | - Kexin Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Zachary Hemminger
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA 90095, USA.,Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095-7239, USA
| | - Roy Wollman
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, CA 90095, USA.,Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90095-7239, USA.,Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095-1569, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA.,Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA.,Department of Computational Medicine, University of California, Los Angeles, CA 90095-1766, USA.,Department of Biostatistics, University of California Los Angeles, CA 90095-1772, USA
| |
Collapse
|