1
|
Maurer K, Grabski IN, Houot R, Gohil SH, Miura S, Redd R, Lyu H, Lu W, Arihara Y, Budka J, McDonough M, Ansuinelli M, Reynolds C, Jacene H, Li S, Livak KJ, Ritz J, Miles B, Mattie M, Neuberg DS, Irizarry RA, Armand P, Wu CJ, Jacobson C. Baseline immune state and T-cell clonal kinetics are associated with durable response to CAR-T therapy in large B-cell lymphoma. Blood 2024; 144:2490-2502. [PMID: 39241199 PMCID: PMC11952007 DOI: 10.1182/blood.2024024381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 07/30/2024] [Accepted: 08/12/2024] [Indexed: 09/08/2024] Open
Abstract
ABSTRACT Engineered cellular therapy with CD19-targeting chimeric antigen receptor T cells (CAR-Ts) has revolutionized outcomes for patients with relapsed/refractory large B-cell lymphoma (LBCL), but the cellular and molecular features associated with response remain largely unresolved. We analyzed serial peripheral blood samples ranging from the day of apheresis (day -28/baseline) to 28 days after CAR-T infusion from 50 patients with LBCL treated with axicabtagene ciloleucel by integrating single-cell RNA and T-cell receptor sequencing, flow cytometry, and mass cytometry to characterize features associated with response to CAR-T. Pretreatment patient characteristics associated with response included the presence of B cells and increased absolute lymphocyte count to absolute monocyte count ratio (ALC/AMC). Infusion products from responders were enriched for clonally expanded, highly activated CD8+ T cells. We expanded these observations to 99 patients from the ZUMA-1 cohort and identified a subset of patients with elevated baseline B cells, 80% of whom were complete responders. We integrated B-cell proportion ≥0.5% and ALC/AMC ≥1.2 into a 2-factor predictive model and applied this model to the ZUMA-1 cohort. Estimated progression-free survival at 1 year in patients meeting 1 or both criteria was 65% vs 31% for patients meeting neither criterion. Our results suggest that patients' immunologic state at baseline affects the likelihood of response to CAR-T through both modulation of the T-cell apheresis product composition and promoting a more favorable circulating immune compartment before therapy. These baseline immunologic features, measured readily in the clinical setting before CAR-T, can be applied to predict response to therapy.
Collapse
MESH Headings
- Humans
- Immunotherapy, Adoptive/methods
- Lymphoma, Large B-Cell, Diffuse/therapy
- Lymphoma, Large B-Cell, Diffuse/immunology
- Male
- Female
- Middle Aged
- Aged
- Adult
- Receptors, Chimeric Antigen/immunology
- Biological Products/therapeutic use
- Antigens, CD19/immunology
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/genetics
- T-Lymphocytes/immunology
- Treatment Outcome
Collapse
Affiliation(s)
- Katie Maurer
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | | | - Roch Houot
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Department of Hematology, University Hospital of Rennes, UMR U1236, INSERM, University of Rennes, Rennes, France
| | - Satyen H. Gohil
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Department of Haematology, University College London, London, United Kingdom
- Department of Haematology, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
| | - Shogo Miura
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Robert Redd
- Department of Biostatistics, Harvard University, Boston, MA
| | - Haoxiang Lyu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
| | - Wesley Lu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
| | - Yohei Arihara
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | | | - Mikaela McDonough
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Michela Ansuinelli
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Carol Reynolds
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Heather Jacene
- Harvard Medical School, Boston, MA
- Department of Imaging, Dana-Farber Cancer Institute, Boston, MA
| | - Shuqiang Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
- Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
| | - Kenneth J. Livak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
| | - Jerome Ritz
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
| | | | | | - Donna S. Neuberg
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
| | - Rafael A. Irizarry
- Department of Biostatistics, Harvard University, Boston, MA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
| | - Philippe Armand
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
| | - Catherine J. Wu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
| | - Caron Jacobson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Harvard Medical School, Boston, MA
| |
Collapse
|
2
|
Ji L, Wang A, Sonthalia S, Naiman DQ, Younes L, Colantuoni C, Geman D. CellCover Captures Neural Stem Cell Progression in Mammalian Neocortical Development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.06.535943. [PMID: 37383947 PMCID: PMC10299349 DOI: 10.1101/2023.04.06.535943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
Definition of cell classes across the tissues of living organisms is central in the analysis of growing atlases of single-cell RNA sequencing (scRNA-seq) data across biomedicine. Marker genes for cell classes are most often defined by differential expression (DE) methods that serially assess individual genes across landscapes of diverse cells. This serial approach has been extremely useful, but is limited because it ignores possible redundancy or complementarity across genes that can only be captured by analyzing multiple genes simultaneously. We aim to identify discriminating panels of genes. To efficiently explore the vast space of possible marker panels, leverage the large number of cells often sequenced, and overcome zero-inflation in scRNA-seq data, we propose viewing gene panel selection as a variation of the "minimal set-covering problem" in combinatorial optimization. We show that this new method, CellCover, captures cell-class-specific signals in the developing mouse neocortex that are distinct from those defined by DE methods. Transfer learning experiments across mouse, primate, and human data demonstrate that CellCover identifies markers of conserved cell classes in neurogenesis, as well as temporal progression in both progenitors and neurons. Exploring markers of human outer radial glia (oRG, or basal RG) across mammals, we show that transcriptomic elements of this key cell type in the expansion of the human cortex appeared in gliogenic precursors of the rodent before the full program emerged in the primate lineage. We have assembled the public datasets we use in this report at NeMO analytics where the expression of individual genes {NeMO Individual Genes} and marker gene panels can be freely explored {NeMO: Telley 3 Sets Covering Panels}, {NeMO: Telley 12 Sets Covering Panels}, and {NeMO: Sorted Brain Cell Covering Panels}. CellCover is available in {CellCover R} and {CellCover Python}.
Collapse
|
3
|
Gonzalez-Ferrer J, Lehrer J, O'Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. SIMS: A deep-learning label transfer tool for single-cell RNA sequencing analysis. CELL GENOMICS 2024; 4:100581. [PMID: 38823397 PMCID: PMC11228957 DOI: 10.1016/j.xgen.2024.100581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 04/02/2024] [Accepted: 05/09/2024] [Indexed: 06/03/2024]
Abstract
Cell atlases serve as vital references for automating cell labeling in new samples, yet existing classification algorithms struggle with accuracy. Here we introduce SIMS (scalable, interpretable machine learning for single cell), a low-code data-efficient pipeline for single-cell RNA classification. We benchmark SIMS against datasets from different tissues and species. We demonstrate SIMS's efficacy in classifying cells in the brain, achieving high accuracy even with small training sets (<3,500 cells) and across different samples. SIMS accurately predicts neuronal subtypes in the developing brain, shedding light on genetic changes during neuronal differentiation and postmitotic fate refinement. Finally, we apply SIMS to single-cell RNA datasets of cortical organoids to predict cell identities and uncover genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Collapse
Affiliation(s)
- Jesus Gonzalez-Ferrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Julian Lehrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Ash O'Farrell
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Mircea Teodorescu
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Vanessa D Jonsson
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| | - Mohammed A Mostajo-Radji
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| |
Collapse
|
4
|
Gonzalez-Ferrer J, Lehrer J, O’Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. Unraveling Neuronal Identities Using SIMS: A Deep Learning Label Transfer Tool for Single-Cell RNA Sequencing Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.529615. [PMID: 36909548 PMCID: PMC10002667 DOI: 10.1101/2023.02.28.529615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Large single-cell RNA datasets have contributed to unprecedented biological insight. Often, these take the form of cell atlases and serve as a reference for automating cell labeling of newly sequenced samples. Yet, classification algorithms have lacked the capacity to accurately annotate cells, particularly in complex datasets. Here we present SIMS (Scalable, Interpretable Machine Learning for Single-Cell), an end-to-end data-efficient machine learning pipeline for discrete classification of single-cell data that can be applied to new datasets with minimal coding. We benchmarked SIMS against common single-cell label transfer tools and demonstrated that it performs as well or better than state of the art algorithms. We then use SIMS to classify cells in one of the most complex tissues: the brain. We show that SIMS classifies cells of the adult cerebral cortex and hippocampus at a remarkably high accuracy. This accuracy is maintained in trans-sample label transfers of the adult human cerebral cortex. We then apply SIMS to classify cells in the developing brain and demonstrate a high level of accuracy at predicting neuronal subtypes, even in periods of fate refinement, shedding light on genetic changes affecting specific cell types across development. Finally, we apply SIMS to single cell datasets of cortical organoids to predict cell identities and unveil genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. When cell types are obscured by stress signals, label transfer from primary tissue improves the accuracy of cortical organoid annotations, serving as a reliable ground truth. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Collapse
Affiliation(s)
- Jesus Gonzalez-Ferrer
- These authors contributed equally to this work
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Julian Lehrer
- These authors contributed equally to this work
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Ash O’Farrell
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Mircea Teodorescu
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Electrical and Computer Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Vanessa D. Jonsson
- Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Co-senior authors
| | - Mohammed A. Mostajo-Radji
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Co-senior authors
| |
Collapse
|
5
|
Aybey B, Zhao S, Brors B, Staub E. Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets. Front Immunol 2023; 14:1194745. [PMID: 37609075 PMCID: PMC10441575 DOI: 10.3389/fimmu.2023.1194745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/14/2023] [Indexed: 08/24/2023] Open
Abstract
Background Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches. Results We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment. Discussion and conclusion We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms.
Collapse
Affiliation(s)
- Bogac Aybey
- Oncology Data Science, Merck Healthcare KGaA, Darmstadt, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Sheng Zhao
- Oncology Data Science, Merck Healthcare KGaA, Darmstadt, Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, German Cancer Research Center, Heidelberg, Germany
| | - Eike Staub
- Oncology Data Science, Merck Healthcare KGaA, Darmstadt, Germany
| |
Collapse
|
6
|
Grabski IN, Street K, Irizarry RA. Significance analysis for clustering with single-cell RNA-sequencing data. Nat Methods 2023; 20:1196-1202. [PMID: 37429993 PMCID: PMC11282907 DOI: 10.1038/s41592-023-01933-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 06/01/2023] [Indexed: 07/12/2023]
Abstract
Unsupervised clustering of single-cell RNA-sequencing data enables the identification of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. We find that not addressing known sources of variability in a statistically rigorous manner can lead to overconfidence in the discovery of novel cell types. Here we extend a previous method, significance of hierarchical clustering, to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. Finally, we extend these approaches to account for batch structure. We benchmarked our approach against popular clustering workflows, demonstrating improved performance. To show practical utility, we applied our approach to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex, identifying several cases of over-clustering and recapitulating experimentally validated cell type definitions.
Collapse
Affiliation(s)
- Isabella N Grabski
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
| | - Kelly Street
- Division of Biostatistics, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
7
|
Covert I, Gala R, Wang T, Svoboda K, Sümbül U, Lee SI. Predictive and robust gene selection for spatial transcriptomics. Nat Commun 2023; 14:2091. [PMID: 37045821 PMCID: PMC10097645 DOI: 10.1038/s41467-023-37392-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 03/16/2023] [Indexed: 04/14/2023] Open
Abstract
A prominent trend in single-cell transcriptomics is providing spatial context alongside a characterization of each cell's molecular state. This typically requires targeting an a priori selection of genes, often covering less than 1% of the genome, and a key question is how to optimally determine the small gene panel. We address this challenge by introducing a flexible deep learning framework, PERSIST, to identify informative gene targets for spatial transcriptomics studies by leveraging reference scRNA-seq data. Using datasets spanning different brain regions, species, and scRNA-seq technologies, we show that PERSIST reliably identifies panels that provide more accurate prediction of the genome-wide expression profile, thereby capturing more information with fewer genes. PERSIST can be adapted to specific biological goals, and we demonstrate that PERSIST's binarization of gene expression levels enables models trained on scRNA-seq data to generalize with to spatial transcriptomics data, despite the complex shift between these technologies.
Collapse
Affiliation(s)
- Ian Covert
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Rohan Gala
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Tim Wang
- HHMI Janelia Research Campus, Ashburn, VA, USA
| | - Karel Svoboda
- HHMI Janelia Research Campus, Ashburn, VA, USA
- Allen Institute for Neural Dynamics, Seattle, WA, USA
| | - Uygar Sümbül
- Allen Institute for Brain Science, Seattle, WA, USA.
| | - Su-In Lee
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Yang F, Wang W, Wang F, Fang Y, Tang D, Huang J, Lu H, Yao J. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00534-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|