Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Grabski IN, Irizarry RA. A probabilistic gene expression barcode for annotation of cell types from single-cell RNA-seq data. Biostatistics 2022;23:1150-1164. [PMID: 35770795 PMCID: PMC9802389 DOI: 10.1093/biostatistics/kxac021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 05/10/2022] [Accepted: 05/22/2022] [Indexed: 01/07/2023] Open

For:	Grabski IN, Irizarry RA. A probabilistic gene expression barcode for annotation of cell types from single-cell RNA-seq data. Biostatistics 2022;23:1150-1164. [PMID: 35770795 PMCID: PMC9802389 DOI: 10.1093/biostatistics/kxac021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 05/10/2022] [Accepted: 05/22/2022] [Indexed: 01/07/2023] Open

Number

Cited by Other Article(s)

Maurer K, Grabski IN, Houot R, Gohil SH, Miura S, Redd R, Lyu H, Lu W, Arihara Y, Budka J, McDonough M, Ansuinelli M, Reynolds C, Jacene H, Li S, Livak KJ, Ritz J, Miles B, Mattie M, Neuberg DS, Irizarry RA, Armand P, Wu CJ, Jacobson C. Baseline immune state and T-cell clonal kinetics are associated with durable response to CAR-T therapy in large B-cell lymphoma. Blood 2024;144:2490-2502. [PMID: 39241199 PMCID: PMC11952007 DOI: 10.1182/blood.2024024381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 07/30/2024] [Accepted: 08/12/2024] [Indexed: 09/08/2024] Open

Affiliation(s)

Katie Maurer Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Harvard Medical School, Boston, MA Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
Isabella N. Grabski Department of Biostatistics, Harvard University, Boston, MA
Roch Houot Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Department of Hematology, University Hospital of Rennes, UMR U1236, INSERM, University of Rennes, Rennes, France
Satyen H. Gohil Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Department of Haematology, University College London, London, United Kingdom Department of Haematology, University College London Hospitals National Health Service Foundation Trust, London, United Kingdom
Shogo Miura Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
Robert Redd Department of Biostatistics, Harvard University, Boston, MA
Haoxiang Lyu Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
Wesley Lu Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
Yohei Arihara Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
Justin Budka Kite, a Gilead Company, Santa Monica, CA
Mikaela McDonough Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
Michela Ansuinelli Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
Carol Reynolds Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
Heather Jacene Harvard Medical School, Boston, MA Department of Imaging, Dana-Farber Cancer Institute, Boston, MA
Shuqiang Li Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
Kenneth J. Livak Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Translational Immunogenomics Laboratory, Dana-Farber Cancer Institute, Boston, MA
Jerome Ritz Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Harvard Medical School, Boston, MA
Brodie Miles Kite, a Gilead Company, Santa Monica, CA
Mike Mattie Kite, a Gilead Company, Santa Monica, CA
Donna S. Neuberg Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
Rafael A. Irizarry Department of Biostatistics, Harvard University, Boston, MA Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
Philippe Armand Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Harvard Medical School, Boston, MA
Catherine J. Wu Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Harvard Medical School, Boston, MA Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA
Caron Jacobson Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA Harvard Medical School, Boston, MA

Collapse

Ji L, Wang A, Sonthalia S, Naiman DQ, Younes L, Colantuoni C, Geman D. CellCover Captures Neural Stem Cell Progression in Mammalian Neocortical Development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.06.535943. [PMID: 37383947 PMCID: PMC10299349 DOI: 10.1101/2023.04.06.535943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]

Abstract

Definition of cell classes across the tissues of living organisms is central in the analysis of growing atlases of single-cell RNA sequencing (scRNA-seq) data across biomedicine. Marker genes for cell classes are most often defined by differential expression (DE) methods that serially assess individual genes across landscapes of diverse cells. This serial approach has been extremely useful, but is limited because it ignores possible redundancy or complementarity across genes that can only be captured by analyzing multiple genes simultaneously. We aim to identify discriminating panels of genes. To efficiently explore the vast space of possible marker panels, leverage the large number of cells often sequenced, and overcome zero-inflation in scRNA-seq data, we propose viewing gene panel selection as a variation of the "minimal set-covering problem" in combinatorial optimization. We show that this new method, CellCover, captures cell-class-specific signals in the developing mouse neocortex that are distinct from those defined by DE methods. Transfer learning experiments across mouse, primate, and human data demonstrate that CellCover identifies markers of conserved cell classes in neurogenesis, as well as temporal progression in both progenitors and neurons. Exploring markers of human outer radial glia (oRG, or basal RG) across mammals, we show that transcriptomic elements of this key cell type in the expansion of the human cortex appeared in gliogenic precursors of the rodent before the full program emerged in the primate lineage. We have assembled the public datasets we use in this report at NeMO analytics where the expression of individual genes {NeMO Individual Genes} and marker gene panels can be freely explored {NeMO: Telley 3 Sets Covering Panels}, {NeMO: Telley 12 Sets Covering Panels}, and {NeMO: Sorted Brain Cell Covering Panels}. CellCover is available in {CellCover R} and {CellCover Python}.

Collapse

Gonzalez-Ferrer J, Lehrer J, O'Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. SIMS: A deep-learning label transfer tool for single-cell RNA sequencing analysis. CELL GENOMICS 2024;4:100581. [PMID: 38823397 PMCID: PMC11228957 DOI: 10.1016/j.xgen.2024.100581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 04/02/2024] [Accepted: 05/09/2024] [Indexed: 06/03/2024]

Affiliation(s)

Jesus Gonzalez-Ferrer Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
Julian Lehrer Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
Ash O'Farrell Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
Benedict Paten Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
Mircea Teodorescu Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
David Haussler Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
Vanessa D Jonsson Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
Mohammed A Mostajo-Radji Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.

Collapse

Gonzalez-Ferrer J, Lehrer J, O’Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. Unraveling Neuronal Identities Using SIMS: A Deep Learning Label Transfer Tool for Single-Cell RNA Sequencing Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.529615. [PMID: 36909548 PMCID: PMC10002667 DOI: 10.1101/2023.02.28.529615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]

Abstract

Large single-cell RNA datasets have contributed to unprecedented biological insight. Often, these take the form of cell atlases and serve as a reference for automating cell labeling of newly sequenced samples. Yet, classification algorithms have lacked the capacity to accurately annotate cells, particularly in complex datasets. Here we present SIMS (Scalable, Interpretable Machine Learning for Single-Cell), an end-to-end data-efficient machine learning pipeline for discrete classification of single-cell data that can be applied to new datasets with minimal coding. We benchmarked SIMS against common single-cell label transfer tools and demonstrated that it performs as well or better than state of the art algorithms. We then use SIMS to classify cells in one of the most complex tissues: the brain. We show that SIMS classifies cells of the adult cerebral cortex and hippocampus at a remarkably high accuracy. This accuracy is maintained in trans-sample label transfers of the adult human cerebral cortex. We then apply SIMS to classify cells in the developing brain and demonstrate a high level of accuracy at predicting neuronal subtypes, even in periods of fate refinement, shedding light on genetic changes affecting specific cell types across development. Finally, we apply SIMS to single cell datasets of cortical organoids to predict cell identities and unveil genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. When cell types are obscured by stress signals, label transfer from primary tissue improves the accuracy of cortical organoid annotations, serving as a reliable ground truth. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.

Collapse

Affiliation(s)

Jesus Gonzalez-Ferrer These authors contributed equally to this work Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
Julian Lehrer These authors contributed equally to this work Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
Ash O’Farrell Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
Benedict Paten Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
Mircea Teodorescu Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Department of Electrical and Computer Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
David Haussler Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
Vanessa D. Jonsson Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Co-senior authors
Mohammed A. Mostajo-Radji Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA Co-senior authors

Collapse

Aybey B, Zhao S, Brors B, Staub E. Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets. Front Immunol 2023;14:1194745. [PMID: 37609075 PMCID: PMC10441575 DOI: 10.3389/fimmu.2023.1194745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/14/2023] [Indexed: 08/24/2023] Open

Abstract

Background

Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches.

Results

We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment.

Discussion and conclusion

We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms.

Collapse

Grabski IN, Street K, Irizarry RA. Significance analysis for clustering with single-cell RNA-sequencing data. Nat Methods 2023;20:1196-1202. [PMID: 37429993 PMCID: PMC11282907 DOI: 10.1038/s41592-023-01933-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 06/01/2023] [Indexed: 07/12/2023]

Covert I, Gala R, Wang T, Svoboda K, Sümbül U, Lee SI. Predictive and robust gene selection for spatial transcriptomics. Nat Commun 2023;14:2091. [PMID: 37045821 PMCID: PMC10097645 DOI: 10.1038/s41467-023-37392-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 03/16/2023] [Indexed: 04/14/2023] Open

Yang F, Wang W, Wang F, Fang Y, Tang D, Huang J, Lu H, Yao J. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00534-z] [Citation(s) in RCA: 166] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]