1
|
Jiang J, Zhang C, Ke L, Hayes N, Zhu Y, Qiu H, Zhang B, Zhou T, Wei GW. A review of machine learning methods for imbalanced data challenges in chemistry. Chem Sci 2025; 16:7637-7658. [PMID: 40271022 PMCID: PMC12013631 DOI: 10.1039/d5sc00270b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Accepted: 04/06/2025] [Indexed: 04/25/2025] Open
Abstract
Imbalanced data, where certain classes are significantly underrepresented in a dataset, is a widespread machine learning (ML) challenge across various fields of chemistry, yet it remains inadequately addressed. This data imbalance can lead to biased ML or deep learning (DL) models, which fail to accurately predict the underrepresented classes, thus limiting the robustness and applicability of these models. With the rapid advancement of ML and DL algorithms, several promising solutions to this issue have emerged, prompting the need for a comprehensive review of current methodologies. In this review, we examine the prominent ML approaches used to tackle the imbalanced data challenge in different areas of chemistry, including resampling techniques, data augmentation techniques, algorithmic approaches, and feature engineering strategies. Each of these methods is evaluated in the context of its application across various aspects of chemistry, such as drug discovery, materials science, cheminformatics, and catalysis. We also explore future directions for overcoming the imbalanced data challenge and emphasize data augmentation via physical models, large language models (LLMs), and advanced mathematics. The benefit of balanced data in new material design and production and the persistent challenges are discussed. Overall, this review aims to elucidate the prevalent ML techniques applied to mitigate the impacts of imbalanced data within the field of chemistry and offer insights into future directions for research and application.
Collapse
Affiliation(s)
- Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University Wuhan 430200 P R. China
- Department of Mathematics, Michigan State University East Lansing Michigan 48824 USA
| | - Chunhuan Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University Wuhan 430200 P R. China
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University Wuhan 430200 P R. China
| | - Nicole Hayes
- Department of Mathematics, Michigan State University East Lansing Michigan 48824 USA
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University Wuhan 430200 P R. China
| | - Huahai Qiu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University Wuhan 430200 P R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University Wuhan 430200 P R. China
| | - Tianshou Zhou
- Key Laboratory of Computational Mathematics, Guangdong Province, School of Mathematics, Sun Yat-sen University Guangzhou 510006 P R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University East Lansing Michigan 48824 USA
- Department of Electrical and Computer Engineering, Michigan State University East Lansing Michigan 48824 USA
- Department of Biochemistry and Molecular Biology, Michigan State University East Lansing Michigan 48824 USA
| |
Collapse
|
2
|
Hackenberg M, Brunn N, Vogel T, Binder H. Infusing structural assumptions into dimensionality reduction for single-cell RNA sequencing data to identify small gene sets. Commun Biol 2025; 8:414. [PMID: 40069486 PMCID: PMC11897155 DOI: 10.1038/s42003-025-07872-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 03/03/2025] [Indexed: 03/15/2025] Open
Abstract
Dimensionality reduction greatly facilitates the exploration of cellular heterogeneity in single-cell RNA sequencing data. While most of such approaches are data-driven, it can be useful to incorporate biologically plausible assumptions about the underlying structure or the experimental design. We propose the boosting autoencoder (BAE) approach, which combines the advantages of unsupervised deep learning for dimensionality reduction and boosting for formalizing assumptions. Specifically, our approach selects small sets of genes that explain latent dimensions. As illustrative applications, we explore the diversity of neural cell identities and temporal patterns of embryonic development.
Collapse
Grants
- Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 322977937, GRK 2344
- Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 322977937, GRK 2344 ; Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 499552394, SFB 1597
- Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 322977937, GRK 2344; Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Project-ID 499552394, SFB 1597
Collapse
Affiliation(s)
- Maren Hackenberg
- Institute of Medical Biometry and Statistics (IMBI), Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany.
- Freiburg Center for Data Analysis, Modeling and AI, University of Freiburg, Freiburg, Germany.
| | - Niklas Brunn
- Institute of Medical Biometry and Statistics (IMBI), Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany.
- Freiburg Center for Data Analysis, Modeling and AI, University of Freiburg, Freiburg, Germany.
| | - Tanja Vogel
- Institute of Anatomy and Cell Biology, Department Molecular Embryology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics (IMBI), Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
- Freiburg Center for Data Analysis, Modeling and AI, University of Freiburg, Freiburg, Germany
- Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg, Germany
| |
Collapse
|
3
|
Shin SW, Mudvari P, Thaploo S, Wheeler MA, Douek DC, Quintana FJ, Boritz EA, Abate AR, Clark IC. FIND-seq: high-throughput nucleic acid cytometry for rare single-cell transcriptomics. Nat Protoc 2024; 19:3191-3218. [PMID: 39039320 PMCID: PMC11537836 DOI: 10.1038/s41596-024-01021-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 05/09/2024] [Indexed: 07/24/2024]
Abstract
Rare cells have an important role in development and disease, and methods for isolating and studying cell subsets are therefore an essential part of biology research. Such methods traditionally rely on labeled antibodies targeted to cell surface proteins, but large public databases and sophisticated computational approaches increasingly define cell subsets on the basis of genomic, epigenomic and transcriptomic sequencing data. Methods for isolating cells on the basis of nucleic acid sequences powerfully complement these approaches by providing experimental access to cell subsets discovered in cell atlases, as well as those that cannot be otherwise isolated, including cells infected with pathogens, with specific DNA mutations or with unique transcriptional or splicing signatures. We recently developed a nucleic acid cytometry platform called 'focused interrogation of cells by nucleic acid detection and sequencing' (FIND-seq), capable of isolating rare cells on the basis of RNA or DNA markers, followed by bulk or single-cell transcriptomic analysis. This platform has previously been used to characterize the splicing-dependent activation of the transcription factor XBP1 in astrocytes and HIV persistence in memory CD4 T cells from people on long-term antiretroviral therapy. Here, we outline the molecular and microfluidic steps involved in performing FIND-seq, including protocol updates that allow detection and whole transcriptome sequencing of rare HIV-infected cells that harbor genetically intact virus genomes. FIND-seq requires knowledge of microfluidics, optics and molecular biology. We expect that FIND-seq, and this comprehensive protocol, will enable mechanistic studies of rare HIV+ cells, as well as other cell subsets that were previously difficult to recover and sequence.
Collapse
Affiliation(s)
- Seung Won Shin
- Department of Bioengineering, College of Engineering, California Institute for Quantitative Biosciences (QB3), University of California Berkeley, Berkeley, CA, USA
| | - Prakriti Mudvari
- Virus Persistence and Dynamics Section, Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Shravan Thaploo
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Michael A Wheeler
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Daniel C Douek
- Human Immunology Section, Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Francisco J Quintana
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Eli A Boritz
- Virus Persistence and Dynamics Section, Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Adam R Abate
- Department of Bioengineering and Therapeutic Sciences, School of Pharmacy, University of California San Francisco, San Francisco, CA, USA
| | - Iain C Clark
- Department of Bioengineering, College of Engineering, California Institute for Quantitative Biosciences (QB3), University of California Berkeley, Berkeley, CA, USA.
| |
Collapse
|
4
|
Silkwood K, Dollinger E, Gervin J, Atwood S, Nie Q, Lander AD. Leveraging gene correlations in single cell transcriptomic data. BMC Bioinformatics 2024; 25:305. [PMID: 39294560 PMCID: PMC11411778 DOI: 10.1186/s12859-024-05926-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 09/09/2024] [Indexed: 09/20/2024] Open
Abstract
BACKGROUND Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data-looking for rare cell types, subtleties of cell states, and details of gene regulatory networks-there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data in which ground truth about biological variation is unknown (i.e., usually). RESULTS We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization-a step that skews distributions, particularly for sparse data-and calculate p values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.
Collapse
Affiliation(s)
- Kai Silkwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Emmanuel Dollinger
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
| | - Joshua Gervin
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Scott Atwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Qing Nie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
| | - Arthur D Lander
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA.
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
5
|
Logotheti S, Pavlopoulou A, Rudsari HK, Galow AM, Kafalı Y, Kyrodimos E, Giotakis AI, Marquardt S, Velalopoulou A, Verginadis II, Koumenis C, Stiewe T, Zoidakis J, Balasingham I, David R, Georgakilas AG. Intercellular pathways of cancer treatment-related cardiotoxicity and their therapeutic implications: the paradigm of radiotherapy. Pharmacol Ther 2024; 260:108670. [PMID: 38823489 DOI: 10.1016/j.pharmthera.2024.108670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 05/16/2024] [Accepted: 05/25/2024] [Indexed: 06/03/2024]
Abstract
Advances in cancer therapeutics have improved patient survival rates. However, cancer survivors may suffer from adverse events either at the time of therapy or later in life. Cardiovascular diseases (CVD) represent a clinically important, but mechanistically understudied complication, which interfere with the continuation of best-possible care, induce life-threatening risks, and/or lead to long-term morbidity. These concerns are exacerbated by the fact that targeted therapies and immunotherapies are frequently combined with radiotherapy, which induces durable inflammatory and immunogenic responses, thereby providing a fertile ground for the development of CVDs. Stressed and dying irradiated cells produce 'danger' signals including, but not limited to, major histocompatibility complexes, cell-adhesion molecules, proinflammatory cytokines, and damage-associated molecular patterns. These factors activate intercellular signaling pathways which have potentially detrimental effects on the heart tissue homeostasis. Herein, we present the clinical crosstalk between cancer and heart diseases, describe how it is potentiated by cancer therapies, and highlight the multifactorial nature of the underlying mechanisms. We particularly focus on radiotherapy, as a case known to often induce cardiovascular complications even decades after treatment. We provide evidence that the secretome of irradiated tumors entails factors that exert systemic, remote effects on the cardiac tissue, potentially predisposing it to CVDs. We suggest how diverse disciplines can utilize pertinent state-of-the-art methods in feasible experimental workflows, to shed light on the molecular mechanisms of radiotherapy-related cardiotoxicity at the organismal level and untangle the desirable immunogenic properties of cancer therapies from their detrimental effects on heart tissue. Results of such highly collaborative efforts hold promise to be translated to next-generation regimens that maximize tumor control, minimize cardiovascular complications, and support quality of life in cancer survivors.
Collapse
Affiliation(s)
- Stella Logotheti
- DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780, Athens, Greece; Biomedical Physics in Radiation Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| | - Athanasia Pavlopoulou
- Izmir Biomedicine and Genome Center, Izmir, Turkey; Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| | | | - Anne-Marie Galow
- Institute for Genome Biology, Research Institute for Farm Animal Biology (FBN), 18196 Dummerstorf, Germany
| | - Yağmur Kafalı
- Izmir Biomedicine and Genome Center, Izmir, Turkey; Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| | - Efthymios Kyrodimos
- First Department of Otorhinolaryngology, Head and Neck Surgery, Hippocrateion General Hospital Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Aris I Giotakis
- First Department of Otorhinolaryngology, Head and Neck Surgery, Hippocrateion General Hospital Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Stephan Marquardt
- Institute of Translational Medicine for Health Care Systems, Medical School Berlin, Hochschule Für Gesundheit Und Medizin, 14197 Berlin, Germany
| | - Anastasia Velalopoulou
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ioannis I Verginadis
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Constantinos Koumenis
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Thorsten Stiewe
- Institute of Molecular Oncology, Philipps-University, 35043 Marburg, Germany; German Center for Lung Research (DZL), Universities of Giessen and Marburg Lung Center (UGMLC), 35043 Marburg, Germany; Genomics Core Facility, Philipps-University, 35043 Marburg, Germany; Institute for Lung Health (ILH), Justus Liebig University, 35392 Giessen, Germany
| | - Jerome Zoidakis
- Department of Biotechnology, Biomedical Research Foundation, Academy of Athens, Athens, Greece; Department of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | | | - Robert David
- Department of Cardiac Surgery, Rostock University Medical Center, 18057 Rostock, Germany; Department of Life, Light & Matter, Interdisciplinary Faculty, Rostock University, 18059 Rostock, Germany
| | - Alexandros G Georgakilas
- DNA Damage Laboratory, Physics Department, School of Applied Mathematical and Physical Sciences, National Technical University of Athens (NTUA), Zografou, 15780, Athens, Greece.
| |
Collapse
|
6
|
Nechanitzky R, Ramachandran P, Nechanitzky D, Li WY, Wakeham AC, Haight J, Saunders ME, Epelman S, Mak TW. CaSSiDI: novel single-cell "Cluster Similarity Scoring and Distinction Index" reveals critical functions for PirB and context-dependent Cebpb repression. Cell Death Differ 2024; 31:265-279. [PMID: 38383888 PMCID: PMC10923835 DOI: 10.1038/s41418-024-01268-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 01/15/2024] [Accepted: 02/07/2024] [Indexed: 02/23/2024] Open
Abstract
PirB is an inhibitory cell surface receptor particularly prominent on myeloid cells. PirB curtails the phenotypes of activated macrophages during inflammation or tumorigenesis, but its functions in macrophage homeostasis are obscure. To elucidate PirB-related functions in macrophages at steady-state, we generated and compared single-cell RNA-sequencing (scRNAseq) datasets obtained from myeloid cell subsets of wild type (WT) and PirB-deficient knockout (PirB KO) mice. To facilitate this analysis, we developed a novel approach to clustering parameter optimization called "Cluster Similarity Scoring and Distinction Index" (CaSSiDI). We demonstrate that CaSSiDI is an adaptable computational framework that facilitates tandem analysis of two scRNAseq datasets by optimizing clustering parameters. We further show that CaSSiDI offers more advantages than a standard Seurat analysis because it allows direct comparison of two or more independently clustered datasets, thereby alleviating the need for batch-correction while identifying the most similar and different clusters. Using CaSSiDI, we found that PirB is a novel regulator of Cebpb expression that controls the generation of Ly6Clo patrolling monocytes and the expansion properties of peritoneal macrophages. PirB's effect on Cebpb is tissue-specific since it was not observed in splenic red pulp macrophages (RPMs). However, CaSSiDI revealed a segregation of the WT RPM population into a CD68loIrf8+ "neuronal-primed" subset and an CD68hiFtl1+ "iron-loaded" subset. Our results establish the utility of CaSSiDI for single-cell assay analyses and the determination of optimal clustering parameters. Our application of CaSSiDI in this study has revealed previously unknown roles for PirB in myeloid cell populations. In particular, we have discovered homeostatic functions for PirB that are related to Cebpb expression in distinct macrophage subsets.
Collapse
Affiliation(s)
- Robert Nechanitzky
- Princess Margaret Cancer Centre, Ontario Cancer Institute, University Health Network, Toronto, ON, Canada.
- Providence Therapeutics Holdings Inc., Calgary, AB, Canada.
| | - Parameswaran Ramachandran
- Princess Margaret Cancer Centre, Ontario Cancer Institute, University Health Network, Toronto, ON, Canada
| | - Duygu Nechanitzky
- Princess Margaret Cancer Centre, Ontario Cancer Institute, University Health Network, Toronto, ON, Canada
| | - Wanda Y Li
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Andrew C Wakeham
- Princess Margaret Cancer Centre, Ontario Cancer Institute, University Health Network, Toronto, ON, Canada
| | - Jillian Haight
- Princess Margaret Cancer Centre, Ontario Cancer Institute, University Health Network, Toronto, ON, Canada
| | - Mary E Saunders
- Princess Margaret Cancer Centre, Ontario Cancer Institute, University Health Network, Toronto, ON, Canada
| | - Slava Epelman
- Toronto General Hospital Research Institute, University Health Network, Toronto, ON, Canada
- Ted Rogers Centre for Heart Research, Translational Biology and Engineering Program, Toronto, ON, Canada
- Peter Munk Cardiac Centre, UHN, Toronto, ON, Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Departments of Immunology and Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Tak W Mak
- Princess Margaret Cancer Centre, Ontario Cancer Institute, University Health Network, Toronto, ON, Canada.
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China.
- Department of Pathology Department of Pathology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
7
|
He Z, Chen Q, Wang K, Lin J, Peng Y, Zhang J, Yan X, Jie Y. Single-cell transcriptomics analysis of cellular heterogeneity and immune mechanisms in neurodegenerative diseases. Eur J Neurosci 2024; 59:333-357. [PMID: 38221677 DOI: 10.1111/ejn.16242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 12/04/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024]
Abstract
Single-cell transcriptomics analysis is an advanced technology that can describe the intracellular transcriptome in complex tissues. It profiles and analyses datasets by single-cell RNA sequencing. Neurodegenerative diseases are identified by the abnormal apoptosis of neurons in the brain with few or no effective therapy strategies at present, which has been a growing healthcare concern and brought a great burden to society. The transcriptome of individual cells provides deep insights into previously unforeseen cellular heterogeneity and gene expression differences in neurodegenerative disorders. It detects multiple cell subsets and functional changes during pathological progression, which deepens the understanding of the molecular underpinnings and cellular basis of neurodegenerative diseases. Furthermore, the transcriptome analysis of immune cells shows the regulation of immune response. Different subtypes of immune cells and their interaction are found to contribute to disease progression. This finding enables the discovery of novel targets and biomarkers for early diagnosis. In this review, we emphasize the principles of the technology, and its recent progress in the study of cellular heterogeneity and immune mechanisms in neurodegenerative diseases. The application of single-cell transcriptomics analysis in neurodegenerative disorders would help explore the pathogenesis of these diseases and develop novel therapeutic methods.
Collapse
Affiliation(s)
- Ziping He
- Department of Forensic Science, School of Basic Medical Science, Central South University, Changsha, China
- Clinical Medicine Eight-Year Program, Xiangya School of Medicine, Central South University, Changsha, China
| | - Qianqian Chen
- Department of Forensic Science, School of Basic Medical Science, Central South University, Changsha, China
| | - Kaiyue Wang
- Department of Forensic Science, School of Basic Medical Science, Central South University, Changsha, China
- Clinical Medicine Eight-Year Program, Xiangya School of Medicine, Central South University, Changsha, China
| | - Jiang Lin
- Department of Forensic Science, School of Basic Medical Science, Central South University, Changsha, China
| | - Yilin Peng
- Department of Forensic Science, School of Basic Medical Science, Central South University, Changsha, China
| | - Jinlong Zhang
- Department of Forensic Science, School of Basic Medical Science, Central South University, Changsha, China
- Department of Forensic Science, School of Basic Medical Science, Xinjiang Medical University, Urumqi, China
| | - Xisheng Yan
- Department of Cardiovascular Medicine, Wuhan Third Hospital & Tongren Hospital of Wuhan University, Wuhan, China
| | - Yan Jie
- Department of Forensic Science, School of Basic Medical Science, Central South University, Changsha, China
- Department of Forensic Science, School of Basic Medical Science, Xinjiang Medical University, Urumqi, China
| |
Collapse
|
8
|
HELLER GERWIN, FUEREDER THORSTEN, GRANDITS ALEXANDERMICHAEL, WIESER ROTRAUD. New perspectives on biology, disease progression, and therapy response of head and neck cancer gained from single cell RNA sequencing and spatial transcriptomics. Oncol Res 2023; 32:1-17. [PMID: 38188682 PMCID: PMC10767240 DOI: 10.32604/or.2023.044774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 10/12/2023] [Indexed: 01/09/2024] Open
Abstract
Head and neck squamous cell carcinoma (HNSCC) is one of the most frequent cancers worldwide. The main risk factors are consumption of tobacco products and alcohol, as well as infection with human papilloma virus. Approved therapeutic options comprise surgery, radiation, chemotherapy, targeted therapy through epidermal growth factor receptor inhibition, and immunotherapy, but outcome has remained unsatisfactory due to recurrence rates of ~50% and the frequent occurrence of second primaries. The availability of the human genome sequence at the beginning of the millennium heralded the omics era, in which rapid technological progress has advanced our knowledge of the molecular biology of malignant diseases, including HNSCC, at an unprecedented pace. Initially, microarray-based methods, followed by approaches based on next-generation sequencing, were applied to study the genetics, epigenetics, and gene expression patterns of bulk tumors. More recently, the advent of single-cell RNA sequencing (scRNAseq) and spatial transcriptomics methods has facilitated the investigation of the heterogeneity between and within different cell populations in the tumor microenvironment (e.g., cancer cells, fibroblasts, immune cells, endothelial cells), led to the discovery of novel cell types, and advanced the discovery of cell-cell communication within tumors. This review provides an overview of scRNAseq, spatial transcriptomics, and the associated bioinformatics methods, and summarizes how their application has promoted our understanding of the emergence, composition, progression, and therapy responsiveness of, and intercellular signaling within, HNSCC.
Collapse
Affiliation(s)
- GERWIN HELLER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | - THORSTEN FUEREDER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | | | - ROTRAUD WIESER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
- Ludwig Boltzmann Institute for Hematology and Oncology, Medical University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
9
|
Silkwood K, Dollinger E, Gervin J, Atwood S, Nie Q, Lander AD. Leveraging gene correlations in single cell transcriptomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532643. [PMID: 36993765 PMCID: PMC10055147 DOI: 10.1101/2023.03.14.532643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
BACKGROUND Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data-looking for rare cell types, subtleties of cell states, and details of gene regulatory networks-there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data when ground truth about biological variation is unknown (i.e., usually). RESULTS We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization-a step that skews distributions, particularly for sparse data-and calculate p-values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.
Collapse
Affiliation(s)
- Kai Silkwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Emmanuel Dollinger
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
- Department of Mathematics, University of California, Irvine, Irvine CA
| | - Josh Gervin
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Scott Atwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Qing Nie
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
- Department of Mathematics, University of California, Irvine, Irvine CA
| | - Arthur D. Lander
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| |
Collapse
|
10
|
Tosoni G, Ayyildiz D, Bryois J, Macnair W, Fitzsimons CP, Lucassen PJ, Salta E. Mapping human adult hippocampal neurogenesis with single-cell transcriptomics: Reconciling controversy or fueling the debate? Neuron 2023; 111:1714-1731.e3. [PMID: 37015226 DOI: 10.1016/j.neuron.2023.03.010] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 02/06/2023] [Accepted: 03/08/2023] [Indexed: 04/05/2023]
Abstract
The notion of exploiting the regenerative potential of the human brain in physiological aging or neurological diseases represents a particularly attractive alternative to conventional strategies for enhancing or restoring brain function. However, a major first question to address is whether the human brain does possess the ability to regenerate. The existence of human adult hippocampal neurogenesis (AHN) has been at the center of a fierce scientific debate for many years. The advent of single-cell transcriptomic technologies was initially viewed as a panacea to resolving this controversy. However, recent single-cell RNA sequencing studies in the human hippocampus yielded conflicting results. Here, we critically discuss and re-analyze previously published AHN-related single-cell transcriptomic datasets. We argue that, although promising, the single-cell transcriptomic profiling of AHN in the human brain can be confounded by methodological, conceptual, and biological factors that need to be consistently addressed across studies and openly discussed within the scientific community.
Collapse
Affiliation(s)
- Giorgia Tosoni
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands
| | - Dilara Ayyildiz
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands
| | - Julien Bryois
- Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center, CH-4070, Basel, Switzerland
| | - Will Macnair
- Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center, CH-4070, Basel, Switzerland
| | - Carlos P Fitzsimons
- Brain Plasticity group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, 1098 XH, Amsterdam, the Netherlands
| | - Paul J Lucassen
- Brain Plasticity group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, 1098 XH, Amsterdam, the Netherlands; Center for Urban Mental Health, University of Amsterdam, 1098 SM, Amsterdam, the Netherlands
| | - Evgenia Salta
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands.
| |
Collapse
|
11
|
Cheng Y, Fan X, Zhang J, Li Y. A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data. Commun Biol 2023; 6:545. [PMID: 37210444 DOI: 10.1038/s42003-023-04928-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/11/2023] [Indexed: 05/22/2023] Open
Abstract
Automatic cell type annotation methods are increasingly used in single-cell RNA sequencing (scRNA-seq) analysis due to their fast and precise advantages. However, current methods often fail to account for the imbalance of scRNA-seq datasets and ignore information from smaller populations, leading to significant biological analysis errors. Here, we introduce scBalance, an integrated sparse neural network framework that incorporates adaptive weight sampling and dropout techniques for auto-annotation tasks. Using 20 scRNA-seq datasets with varying scales and degrees of imbalance, we demonstrate that scBalance outperforms current methods in both intra- and inter-dataset annotation tasks. Additionally, scBalance displays impressive scalability in identifying rare cell types in million-level datasets, as shown in the bronchoalveolar cell landscape. scBalance is also significantly faster than commonly used tools and comes in a user-friendly format, making it a superior tool for scRNA-seq analysis on the Python-based platform.
Collapse
Affiliation(s)
- Yuqi Cheng
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Xingyu Fan
- School of Information and Software Engineering, University of Electronic Science and Technology of China, 610054, Chengdu, China
| | - Jianing Zhang
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China.
- The CUHK Shenzhen Research Institute, Hi-Tech Park, Nanshan, 518057, Shenzhen, China.
| |
Collapse
|