1
|
Marshall L, Raychaudhuri S, Viatte S. Understanding rheumatic disease through continuous cell state analysis. Nat Rev Rheumatol 2025:10.1038/s41584-025-01253-6. [PMID: 40335652 DOI: 10.1038/s41584-025-01253-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2025] [Indexed: 05/09/2025]
Abstract
Autoimmune rheumatic diseases are a heterogeneous group of conditions, including rheumatoid arthritis (RA) and systemic lupus erythematosus. With the increasing availability of large single-cell datasets, novel disease-associated cell types continue to be identified and characterized at multiple omics layers, for example, 'T peripheral helper' (TPH) (CXCR5- PD-1hi) cells in RA and systemic lupus erythematosus and MerTK+ myeloid cells in RA. Despite efforts to define disease-relevant cell atlases, the very definition of a 'cell type' or 'lineage' has proven controversial as higher resolution assays emerge. This Review explores the cell types and states involved in disease pathogenesis, with a focus on the shifting perspectives on immune and stromal cell taxonomy. These understandings of cell identity are closely related to the computational methods adopted for analysis, with implications for the interpretation of single-cell data. Understanding the underlying cellular architecture of disease is also crucial for therapeutic research as ambiguity hinders translation to the clinical setting. We discuss the implications of different frameworks for cell identity for disease treatment and the discovery of predictive biomarkers for stratified medicine - an unmet clinical need for autoimmune rheumatic diseases.
Collapse
Affiliation(s)
- Lysette Marshall
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK
| | - Soumya Raychaudhuri
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology, Inflammation and Immunity and Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute, Cambridge, MA, USA
| | - Sebastien Viatte
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK.
- NIHR Manchester Musculoskeletal Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.
- Lydia Becker Institute of Immunology and Inflammation, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK.
| |
Collapse
|
2
|
Wu X, Teo YV, Neretti N, Wu Z. Mouse blood cells types and aging prediction using penalized Latent Dirichlet Allocation. BMC Genomics 2024; 23:866. [PMID: 39294566 PMCID: PMC11409595 DOI: 10.1186/s12864-024-10763-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/02/2024] [Indexed: 09/20/2024] Open
Abstract
BACKGROUND Aging is a complex, heterogeneous process that has multiple causes. Knowledge on genomic, epigenomic and transcriptomic changes during the aging process shed light on understanding the aging mechanism. A recent breakthrough in biotechnology, single cell RNAseq, is revolutionizing aging study by providing gene expression profile of the entire transcriptome of individual cells. Many interesting information could be inferred from this new type of data with the help of novel computational methods. RESULTS In this manuscript a novel statistical method, penalized Latent Dirichlet Allocation (pLDA), is applied to an aging mouse blood scRNA-seq data set. A pipeline is built for cell type and aging prediction. The sequence of models in the pipeline take scRNA-seq expression counts as input, preprocess the data using pLDA and predict the cell type and aging status. CONCLUSIONS pLDA learns a dimension reduced representation of the expression profile. This representation allows identification of cell types and has predictability of the age of cells.
Collapse
Affiliation(s)
- Xiaotian Wu
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - Yee Voan Teo
- Department of Molecular Biology, Cell Biolgy, and Biochemistry, Brown University, Providence, RI, USA
| | - Nicola Neretti
- Department of Molecular Biology, Cell Biolgy, and Biochemistry, Brown University, Providence, RI, USA
| | - Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI, USA.
| |
Collapse
|
3
|
Rebboah E, Rezaie N, Williams BA, Weimer AK, Shi M, Yang X, Liang HY, Dionne LA, Reese F, Trout D, Jou J, Youngworth I, Reinholdt L, Morabito S, Snyder MP, Wold BJ, Mortazavi A. The ENCODE mouse postnatal developmental time course identifies regulatory programs of cell types and cell states. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.12.598567. [PMID: 38915583 PMCID: PMC11195270 DOI: 10.1101/2024.06.12.598567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Postnatal genomic regulation significantly influences tissue and organ maturation but is under-studied relative to existing genomic catalogs of adult tissues or prenatal development in mouse. The ENCODE4 consortium generated the first comprehensive single-nucleus resource of postnatal regulatory events across a diverse set of mouse tissues. The collection spans seven postnatal time points, mirroring human development from childhood to adulthood, and encompasses five core tissues. We identified 30 cell types, further subdivided into 69 subtypes and cell states across adrenal gland, left cerebral cortex, hippocampus, heart, and gastrocnemius muscle. Our annotations cover both known and novel cell differentiation dynamics ranging from early hippocampal neurogenesis to a new sex-specific adrenal gland population during puberty. We used an ensemble Latent Dirichlet Allocation strategy with a curated vocabulary of 2,701 regulatory genes to identify regulatory "topics," each of which is a gene vector, linked to cell type differentiation, subtype specialization, and transitions between cell states. We find recurrent regulatory topics in tissue-resident macrophages, neural cell types, endothelial cells across multiple tissues, and cycling cells of the adrenal gland and heart. Cell-type-specific topics are enriched in transcription factors and microRNA host genes, while chromatin regulators dominate mitosis topics. Corresponding chromatin accessibility data reveal dynamic and sex-specific regulatory elements, with enriched motifs matching transcription factors in regulatory topics. Together, these analyses identify both tissue-specific and common regulatory programs in postnatal development across multiple tissues through the lens of the factors regulating transcription.
Collapse
Affiliation(s)
- Elisabeth Rebboah
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Narges Rezaie
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Brian A. Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Annika K. Weimer
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, USA
| | - Minyi Shi
- Department of Next Generation Sequencing and Microchemistry, Proteomics and Lipidomics, Genentech, San Francisco, USA
| | - Xinqiong Yang
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Heidi Yahan Liang
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
| | | | - Fairlie Reese
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ingrid Youngworth
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | | | - Samuel Morabito
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Michael P. Snyder
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Barbara J. Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| |
Collapse
|
4
|
Tiong KL, Luzhbin D, Yeang CH. Assessing transcriptomic heterogeneity of single-cell RNASeq data by bulk-level gene expression data. BMC Bioinformatics 2024; 25:209. [PMID: 38867193 PMCID: PMC11167951 DOI: 10.1186/s12859-024-05825-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 06/03/2024] [Indexed: 06/14/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (sc-RNASeq) data illuminate transcriptomic heterogeneity but also possess a high level of noise, abundant missing entries and sometimes inadequate or no cell type annotations at all. Bulk-level gene expression data lack direct information of cell population composition but are more robust and complete and often better annotated. We propose a modeling framework to integrate bulk-level and single-cell RNASeq data to address the deficiencies and leverage the mutual strengths of each type of data and enable a more comprehensive inference of their transcriptomic heterogeneity. Contrary to the standard approaches of factorizing the bulk-level data with one algorithm and (for some methods) treating single-cell RNASeq data as references to decompose bulk-level data, we employed multiple deconvolution algorithms to factorize the bulk-level data, constructed the probabilistic graphical models of cell-level gene expressions from the decomposition outcomes, and compared the log-likelihood scores of these models in single-cell data. We term this framework backward deconvolution as inference operates from coarse-grained bulk-level data to fine-grained single-cell data. As the abundant missing entries in sc-RNASeq data have a significant effect on log-likelihood scores, we also developed a criterion for inclusion or exclusion of zero entries in log-likelihood score computation. RESULTS We selected nine deconvolution algorithms and validated backward deconvolution in five datasets. In the in-silico mixtures of mouse sc-RNASeq data, the log-likelihood scores of the deconvolution algorithms were strongly anticorrelated with their errors of mixture coefficients and cell type specific gene expression signatures. In the true bulk-level mouse data, the sample mixture coefficients were unknown but the log-likelihood scores were strongly correlated with accuracy rates of inferred cell types. In the data of autism spectrum disorder (ASD) and normal controls, we found that ASD brains possessed higher fractions of astrocytes and lower fractions of NRGN-expressing neurons than normal controls. In datasets of breast cancer and low-grade gliomas (LGG), we compared the log-likelihood scores of three simple hypotheses about the gene expression patterns of the cell types underlying the tumor subtypes. The model that tumors of each subtype were dominated by one cell type persistently outperformed an alternative model that each cell type had elevated expression in one gene group and tumors were mixtures of those cell types. Superiority of the former model is also supported by comparing the real breast cancer sc-RNASeq clusters with those generated by simulated sc-RNASeq data. CONCLUSIONS The results indicate that backward deconvolution serves as a sensible model selection tool for deconvolution algorithms and facilitates discerning hypotheses about cell type compositions underlying heterogeneous specimens such as tumors.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Dmytro Luzhbin
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
5
|
Rezaie N, Rebboah E, Williams BA, Liang HY, Reese F, Balderrama-Gutierrez G, Dionne LA, Reinholdt L, Trout D, Wold BJ, Mortazavi A. Identification of robust cellular programs using reproducible LDA that impact sex-specific disease progression in different genotypes of a mouse model of AD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582178. [PMID: 38464087 PMCID: PMC10925135 DOI: 10.1101/2024.02.26.582178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The gene expression profiles of distinct cell types reflect complex genomic interactions among multiple simultaneous biological processes within each cell that can be altered by disease progression as well as genetic background. The identification of these active cellular programs is an open challenge in the analysis of single-cell RNA-seq data. Latent Dirichlet Allocation (LDA) is a generative method used to identify recurring patterns in counts data, commonly referred to as topics that can be used to interpret the state of each cell. However, LDA's interpretability is hindered by several key factors including the hyperparameter selection of the number of topics as well as the variability in topic definitions due to random initialization. We developed Topyfic, a Reproducible LDA (rLDA) package, to accurately infer the identity and activity of cellular programs in single-cell data, providing insights into the relative contributions of each program in individual cells. We apply Topyfic to brain single-cell and single-nucleus datasets of two 5xFAD mouse models of Alzheimer's disease crossed with C57BL6/J or CAST/EiJ mice to identify distinct cell types and states in different cell types such as microglia. We find that 8-month 5xFAD/Cast F1 males show higher level of microglial activation than matching 5xFAD/BL6 F1 males, whereas female mice show similar levels of microglial activation. We show that regulatory genes such as TFs, microRNA host genes, and chromatin regulatory genes alone capture cell types and cell states. Our study highlights how topic modeling with a limited vocabulary of regulatory genes can identify gene expression programs in single-cell data in order to quantify similar and divergent cell states in distinct genotypes.
Collapse
Affiliation(s)
- Narges Rezaie
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Elisabeth Rebboah
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Brian A Williams
- Division of Biology, California Institute of Technology, Pasadena, CA, USA
| | - Heidi Yahan Liang
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Fairlie Reese
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | - Gabriela Balderrama-Gutierrez
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| | | | | | - Diane Trout
- Division of Biology, California Institute of Technology, Pasadena, CA, USA
| | - Barbara J Wold
- Division of Biology, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, CA, USA
| |
Collapse
|
6
|
Peng X, Lee J, Adamow M, Maher C, Postow MA, Callahan MK, Panageas KS, Shen R. A topic modeling approach reveals the dynamic T cell composition of peripheral blood during cancer immunotherapy. CELL REPORTS METHODS 2023; 3:100546. [PMID: 37671017 PMCID: PMC10475788 DOI: 10.1016/j.crmeth.2023.100546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 02/15/2023] [Accepted: 07/10/2023] [Indexed: 09/07/2023]
Abstract
We present TopicFlow, a computational framework for flow cytometry data analysis of patient blood samples for the identification of functional and dynamic topics in circulating T cell population. This framework applies a Latent Dirichlet Allocation (LDA) model, adapting the concept of topic modeling in text mining to flow cytometry. To demonstrate the utility of our method, we conducted an analysis of ∼17 million T cells collected from 138 peripheral blood samples in 51 patients with melanoma undergoing treatment with immune checkpoint inhibitors (ICIs). Our study highlights three latent dynamic topics identified by LDA: a T cell exhaustion topic that independently recapitulates the previously identified LAG-3+ immunotype associated with ICI resistance, a naive topic and its association with immune-related toxicity, and a T cell activation topic that emerges upon ICI treatment. Our approach can be broadly applied to mine high-parameter flow cytometry data for insights into mechanisms of treatment response and toxicity.
Collapse
Affiliation(s)
- Xiyu Peng
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jasme Lee
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Matthew Adamow
- Immune Monitoring Facility, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA 94129, USA
| | - Colleen Maher
- Parker Institute for Cancer Immunotherapy, San Francisco, CA 94129, USA
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Michael A. Postow
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Weill Cornell Medical College, New York, NY 10065, USA
| | - Margaret K. Callahan
- Parker Institute for Cancer Immunotherapy, San Francisco, CA 94129, USA
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Weill Cornell Medical College, New York, NY 10065, USA
| | - Katherine S. Panageas
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| |
Collapse
|
7
|
Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. CELL REPORTS METHODS 2023; 3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.
Collapse
Affiliation(s)
- Ihuan Gunawan
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - John George Lock
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
- Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| |
Collapse
|
8
|
Yang Q, Xu Z, Zhou W, Wang P, Jiang Q, Juan L. An interpretable single-cell RNA sequencing data clustering method based on latent Dirichlet allocation. Brief Bioinform 2023; 24:bbad199. [PMID: 37225419 PMCID: PMC10359080 DOI: 10.1093/bib/bbad199] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 05/04/2023] [Accepted: 05/08/2023] [Indexed: 05/26/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) detects whole transcriptome signals for large amounts of individual cells and is powerful for determining cell-to-cell differences and investigating the functional characteristics of various cell types. scRNA-seq datasets are usually sparse and highly noisy. Many steps in the scRNA-seq analysis workflow, including reasonable gene selection, cell clustering and annotation, as well as discovering the underlying biological mechanisms from such datasets, are difficult. In this study, we proposed an scRNA-seq analysis method based on the latent Dirichlet allocation (LDA) model. The LDA model estimates a series of latent variables, i.e. putative functions (PFs), from the input raw cell-gene data. Thus, we incorporated the 'cell-function-gene' three-layer framework into scRNA-seq analysis, as this framework is capable of discovering latent and complex gene expression patterns via a built-in model approach and obtaining biologically meaningful results through a data-driven functional interpretation process. We compared our method with four classic methods on seven benchmark scRNA-seq datasets. The LDA-based method performed best in the cell clustering test in terms of both accuracy and purity. By analysing three complex public datasets, we demonstrated that our method could distinguish cell types with multiple levels of functional specialization, and precisely reconstruct cell development trajectories. Moreover, the LDA-based method accurately identified the representative PFs and the representative genes for the cell types/cell stages, enabling data-driven cell cluster annotation and functional interpretation. According to the literature, most of the previously reported marker/functionally relevant genes were recognized.
Collapse
Affiliation(s)
- Qi Yang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Zhaochun Xu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
9
|
Peng X, Lee J, Adamow M, Maher C, Postow MA, Callahan MK, Panageas KS, Shen R. Uncovering the hidden structure of dynamic T cell composition in peripheral blood during cancer immunotherapy: a topic modeling approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538095. [PMID: 37162890 PMCID: PMC10168231 DOI: 10.1101/2023.04.24.538095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Immune checkpoint inhibitors (ICIs), now mainstays in the treatment of cancer treatment, show great potential but only benefit a subset of patients. A more complete understanding of the immunological mechanisms and pharmacodynamics of ICI in cancer patients will help identify the patients most likely to benefit and will generate knowledge for the development of next-generation ICI regimens. We set out to interrogate the early temporal evolution of T cell populations from longitudinal single-cell flow cytometry data. We developed an innovative statistical and computational approach using a Latent Dirichlet Allocation (LDA) model that extends the concept of topic modeling used in text mining. This powerful unsupervised learning tool allows us to discover compositional topics within immune cell populations that have distinct functional and differentiation states and are biologically and clinically relevant. To illustrate the model's utility, we analyzed ∼17 million T cells obtained from 138 pre- and on-treatment peripheral blood samples from a cohort of melanoma patients treated with ICIs. We identified three latent dynamic topics: a T-cell exhaustion topic that recapitulates a LAG3+ predominant patient subgroup with poor clinical outcome; a naive topic that shows association with immune-related toxicity; and an immune activation topic that emerges upon ICI treatment. We identified that a patient subgroup with a high baseline of the naïve topic has a higher toxicity grade. While the current application is demonstrated using flow cytometry data, our approach has broader utility and creates a new direction for translating single-cell data into biological and clinical insights.
Collapse
Affiliation(s)
- Xiyu Peng
- Department of Epidemiology and Biostatistics, San Francisco, CA
| | - Jasme Lee
- Department of Epidemiology and Biostatistics, San Francisco, CA
| | - Matthew Adamow
- Immune Monitoring Facility, San Francisco, CA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA
| | - Colleen Maher
- Parker Institute for Cancer Immunotherapy, San Francisco, CA
- Department of Medicine, Memorial Sloan Kettering Cancer Center New York, NY
| | - Michael A Postow
- Department of Medicine, Memorial Sloan Kettering Cancer Center New York, NY
- Weill Cornell Medical College, New York, NY
| | - Margaret K Callahan
- Parker Institute for Cancer Immunotherapy, San Francisco, CA
- Department of Medicine, Memorial Sloan Kettering Cancer Center New York, NY
- Weill Cornell Medical College, New York, NY
| | | | - Ronglai Shen
- Department of Epidemiology and Biostatistics, San Francisco, CA
| |
Collapse
|
10
|
Molina-Moreno M, González-Díaz I, Sicilia J, Crainiciuc G, Palomino-Segura M, Hidalgo A, Díaz-de-María F. ACME: Automatic feature extraction for cell migration examination through intravital microscopy imaging. Med Image Anal 2022; 77:102358. [DOI: 10.1016/j.media.2022.102358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 01/07/2022] [Accepted: 01/09/2022] [Indexed: 02/06/2023]
|