1
|
Noel RL, Kugelman T, Karakatsani ME, Shahriar S, Willner MJ, Choi CS, Nimi Y, Ji R, Agalliu D, Konofagou EE. Safe focused ultrasound-mediated blood-brain barrier opening is driven primarily by transient reorganization of tight junctions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.28.635258. [PMID: 39975117 PMCID: PMC11838333 DOI: 10.1101/2025.01.28.635258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Focused ultrasound (FUS) with microbubbles opens the blood-brain barrier (BBB) to allow targeted drug delivery into the brain. The mechanisms by which endothelial cells (ECs) respond to either low acoustic pressures known to open the BBB transiently, or high acoustic pressures that cause brain damage, remain incompletely characterized. Here, we use a mouse strain where tight junctions between ECs are labelled with eGFP and apply FUS at low (450 kPa) and high (750 kPa) acoustic pressures, after which mice are sacrificed at 1 or 72 hours. We find that the EC response leading to FUS-mediated BBB opening at low pressures is localized primarily in arterioles and capillaries, and characterized by a transient loss and reorganization of tight junctions. BBB opening still occurs at low safe pressures in mice lacking caveolae, suggesting that it is driven primarily by transient dismantlement and reorganization of tight junctions. In contrast, BBB opening at high pressures is associated with obliteration of EC tight junctions that remain unrepaired even after 72 hours, allowing continuous fibrinogen passage and persistent microglial activation. Single-cell RNA-sequencing of arteriole, capillary and venule ECs from FUS mice reveals that the transcriptomic responses of ECs exposed to high pressure are dominated by genes belonging to the stress response and cell junction disassembly at both 1 and 72 hours, while lower pressures induce primarily genes responsible for intracellular repair responses in ECs. Our findings suggest that at low pressures transient reorganization of tight junctions and repair responses mediate safe BBB opening for therapeutic delivery. Significance Statement Focused ultrasound with microbubbles is used as a noninvasive method to safely open the BBB at low acoustic pressures for therapeutic delivery into the CNS, but the mechanisms mediating this process remain unclear. Kugelman et al., demonstrate that FUS-mediated BBB opening at low pressures occurs primarily in arterioles and capillaries due to transient reorganization of tight junctions. BBB opening still occurs at low safe pressures in mice lacking caveolae, suggesting a transcellular route-independent mechanism. At high unsafe pressures, cell junctions are obliterated and remain unrepaired even after 72 hours, allowing fibrinogen passage and persistent microglial activation. Single-cell RNA-sequencing supports cell biological findings that safe, FUS-mediated BBB opening may be driven by transient reorganization and repair of EC tight junctions.
Collapse
|
2
|
Colonna M, Konopka G, Liddelow SA, Nowakowski T, Awatramani R, Bateup HS, Cadwell CR, Caglayan E, Chen JL, Gillis J, Kampmann M, Krienen F, Marsh SE, Monje M, O'Dea MR, Patani R, Pollen AA, Quintana FJ, Scavuzzo M, Schmitz M, Sloan SA, Tesar PJ, Tollkuhn J, Tosches MA, Urbanek ME, Werner JM, Bayraktar OA, Gokce O, Habib N. Implementation and validation of single-cell genomics experiments in neuroscience. Nat Neurosci 2024; 27:2310-2325. [PMID: 39627589 DOI: 10.1038/s41593-024-01814-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 10/15/2024] [Indexed: 12/13/2024]
Abstract
Single-cell or single-nucleus transcriptomics is a powerful tool for identifying cell types and cell states. However, hypotheses derived from these assays, including gene expression information, require validation, and their functional relevance needs to be established. The choice of validation depends on numerous factors. Here, we present types of orthogonal and functional validation experiment to strengthen preliminary findings obtained using single-cell and single-nucleus transcriptomics as well as the challenges and limitations of these approaches.
Collapse
Affiliation(s)
- Marco Colonna
- Department of Pathology and Immunology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.
| | - Genevieve Konopka
- Department of Neuroscience, Peter O'Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA.
| | - Shane A Liddelow
- Neuroscience Institute, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Neuroscience & Physiology, NYU Grossman School of Medicine, New York, NY, USA.
- Department of Ophthalmology, NYU Grossman School of Medicine, New York, NY, USA.
- Parekh Center for Interdisciplinary Neurology, NYU Grossman School of Medicine, New York, NY, USA.
| | - Tomasz Nowakowski
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
| | - Rajeshwar Awatramani
- Department of Microbiology and Immunology, Northwestern University, Chicago, IL, USA
| | - Helen S Bateup
- Department of Molecular and Cellular Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Neuroscience, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Cathryn R Cadwell
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Pathology, University of California, San Francisco, San Francisco, CA, USA
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA, USA
| | - Emre Caglayan
- Department of Neuroscience, Peter O'Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA
| | - Jerry L Chen
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
- Center for Neurophotonics, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, MA, USA
| | - Jesse Gillis
- Department of Physiology and Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Martin Kampmann
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
| | - Fenna Krienen
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
| | - Samuel E Marsh
- F.M. Kirby Neurobiology Center, Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michelle Monje
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University, Stanford, CA, USA
| | - Michael R O'Dea
- Neuroscience Institute, NYU Grossman School of Medicine, New York, NY, USA
| | - Rickie Patani
- Department of Neuromuscular Disease, UCL Queen Square Institute of Neurology, London, UK
- The Francis Crick Institute, Human Stem Cells and Neurodegeneration Laboratory, London, UK
| | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Francisco J Quintana
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Marissa Scavuzzo
- Department of Genetics and Genome Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, OH, USA
- Institute for Glial Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Matthew Schmitz
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
| | - Steven A Sloan
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
| | - Paul J Tesar
- Department of Genetics and Genome Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, OH, USA
- Institute for Glial Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | | | | | - Madeleine E Urbanek
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Jonathan M Werner
- Department of Physiology and Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Ozgun Gokce
- Department of Old Age Psychiatry and Cognitive Disorders, University Hospital Bonn, Bonn, Germany.
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.
| | - Naomi Habib
- Edmond & Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
3
|
Shahriar S, Biswas S, Zhao K, Akcan U, Tuohy MC, Glendinning MD, Kurt A, Wayne CR, Prochilo G, Price MZ, Stuhlmann H, Brekken RA, Menon V, Agalliu D. VEGF-A-mediated venous endothelial cell proliferation results in neoangiogenesis during neuroinflammation. Nat Neurosci 2024; 27:1904-1917. [PMID: 39256571 DOI: 10.1038/s41593-024-01746-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 08/01/2024] [Indexed: 09/12/2024]
Abstract
Newly formed leaky vessels and blood-brain barrier (BBB) damage are present in demyelinating acute and chronic lesions in multiple sclerosis (MS) and experimental autoimmune encephalomyelitis (EAE). However, the endothelial cell subtypes and signaling pathways contributing to these leaky neovessels are unclear. Here, using single-cell transcriptional profiling and in vivo validation studies, we show that venous endothelial cells express neoangiogenesis gene signatures and show increased proliferation resulting in enlarged veins and higher venous coverage in acute and chronic EAE lesions in female adult mice. These changes correlate with the upregulation of vascular endothelial growth factor A (VEGF-A) signaling. We also confirmed increased expression of neoangiogenic markers in acute and chronic human MS lesions. Treatment with a VEGF-A blocking antibody diminishes the neoangiogenic transcriptomic signatures and vascular proliferation in female adult mice with EAE, but it does not restore BBB function or ameliorate EAE pathology. Our data demonstrate that venous endothelial cells contribute to neoangiogenesis in demyelinating neuroinflammatory conditions.
Collapse
Affiliation(s)
- Sanjid Shahriar
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
- Wyss Institute for Biologically Inspired Engineering, Boston, MA, USA
| | - Saptarshi Biswas
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
| | - Kaitao Zhao
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
| | - Uğur Akcan
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
| | - Mary Claire Tuohy
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
| | - Michael D Glendinning
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
| | - Ali Kurt
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
| | - Charlotte R Wayne
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
| | - Grace Prochilo
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Maxwell Z Price
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Heidi Stuhlmann
- Department of Cell and Developmental Biology, Weill Cornell Medical College, New York, NY, USA
| | - Rolf A Brekken
- Department of Surgery, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Vilas Menon
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA
| | - Dritan Agalliu
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Neurology, Columbia University Irving Medical Center, New York, NY, USA.
| |
Collapse
|
4
|
Grobecker P, Sakoparnig T, van Nimwegen E. Identifying cell states in single-cell RNA-seq data at statistically maximal resolution. PLoS Comput Biol 2024; 20:e1012224. [PMID: 38995959 PMCID: PMC11364423 DOI: 10.1371/journal.pcbi.1012224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 08/30/2024] [Accepted: 06/04/2024] [Indexed: 07/14/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a popular experimental method to study variation of gene expression within a population of cells. However, obtaining an accurate picture of the diversity of distinct gene expression states that are present in a given dataset is highly challenging because of the sparsity of the scRNA-seq data and its inhomogeneous measurement noise properties. Although a vast number of different methods is applied in the literature for clustering cells into subsets with 'similar' expression profiles, these methods generally lack rigorously specified objectives, involve multiple complex layers of normalization, filtering, feature selection, dimensionality-reduction, employ ad hoc measures of distance or similarity between cells, often ignore the known measurement noise properties of scRNA-seq measurements, and include a large number of tunable parameters. Consequently, it is virtually impossible to assign concrete biophysical meaning to the clusterings that result from these methods. Here we address the following problem: Given raw unique molecule identifier (UMI) counts of an scRNA-seq dataset, partition the cells into subsets such that the gene expression states of the cells in each subset are statistically indistinguishable, and each subset corresponds to a distinct gene expression state. That is, we aim to partition cells so as to maximally reduce the complexity of the dataset without removing any of its meaningful structure. We show that, given the known measurement noise structure of scRNA-seq data, this problem is mathematically well-defined and derive its unique solution from first principles. We have implemented this solution in a tool called Cellstates which operates directly on the raw data and automatically determines the optimal partition and cluster number, with zero tunable parameters. We show that, on synthetic datasets, Cellstates almost perfectly recovers optimal partitions. On real data, Cellstates robustly identifies subtle substructure within groups of cells that are traditionally annotated as a common cell type. Moreover, we show that the diversity of gene expression states that Cellstates identifies systematically depends on the tissue of origin and not on technical features of the experiments such as the total number of cells and total UMI count per cell. In addition to the Cellstates tool we also provide a small toolbox of software to place the identified cellstates into a hierarchical tree of higher-order clusters, to identify the most important differentially expressed genes at each branch of this hierarchy, and to visualize these results.
Collapse
Affiliation(s)
- Pascal Grobecker
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Thomas Sakoparnig
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Erik van Nimwegen
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
5
|
Biswas S, Shahriar S, Bachay G, Arvanitis P, Jamoul D, Brunken WJ, Agalliu D. Glutamatergic neuronal activity regulates angiogenesis and blood-retinal barrier maturation via Norrin/β-catenin signaling. Neuron 2024; 112:1978-1996.e6. [PMID: 38599212 PMCID: PMC11189759 DOI: 10.1016/j.neuron.2024.03.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 01/15/2024] [Accepted: 03/11/2024] [Indexed: 04/12/2024]
Abstract
Interactions among neuronal, glial, and vascular components are crucial for retinal angiogenesis and blood-retinal barrier (BRB) maturation. Although synaptic dysfunction precedes vascular abnormalities in many retinal pathologies, how neuronal activity, specifically glutamatergic activity, regulates retinal angiogenesis and BRB maturation remains unclear. Using in vivo genetic studies in mice, single-cell RNA sequencing (scRNA-seq), and functional validation, we show that deep plexus angiogenesis and paracellular BRB maturation are delayed in Vglut1-/- retinas where neurons fail to release glutamate. By contrast, deep plexus angiogenesis and paracellular BRB maturation are accelerated in Gnat1-/- retinas, where constitutively depolarized rods release excessive glutamate. Norrin expression and endothelial Norrin/β-catenin signaling are downregulated in Vglut1-/- retinas and upregulated in Gnat1-/- retinas. Pharmacological activation of endothelial Norrin/β-catenin signaling in Vglut1-/- retinas rescues defects in deep plexus angiogenesis and paracellular BRB maturation. Our findings demonstrate that glutamatergic neuronal activity regulates retinal angiogenesis and BRB maturation by modulating endothelial Norrin/β-catenin signaling.
Collapse
Affiliation(s)
- Saptarshi Biswas
- Department of Neurology, Columbia University Irving Medical Center, New York, NY 10032, USA.
| | - Sanjid Shahriar
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Wyss Institute for Biologically Inspired Engineering, Boston, MA 02115, USA
| | - Galina Bachay
- Department of Ophthalmology and Visual Sciences, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Panos Arvanitis
- Warren Alpert Medical School, Brown University, Providence, RI 02903, USA
| | - Danny Jamoul
- Department of Neurology, Columbia University Irving Medical Center, New York, NY 10032, USA; John Jay College of Criminal Justice, City University of New York, New York, NY 10019, USA
| | - William J Brunken
- Department of Ophthalmology and Visual Sciences, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Dritan Agalliu
- Department of Neurology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY 10032, USA.
| |
Collapse
|
6
|
Xiong J, Kaur H, Heiser CN, McKinley ET, Roland JT, Coffey RJ, Shrubsole MJ, Wrobel J, Ma S, Lau KS, Vandekar S. GammaGateR: semi-automated marker gating for single-cell multiplexed imaging. Bioinformatics 2024; 40:btae356. [PMID: 38833684 PMCID: PMC11193056 DOI: 10.1093/bioinformatics/btae356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 04/20/2024] [Accepted: 06/03/2024] [Indexed: 06/06/2024] Open
Abstract
MOTIVATION Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. RESULTS To address the need for an evaluable semi-automated algorithm, we developed GammaGateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slide-specific model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation. AVAILABILITY AND IMPLEMENTATION The R package is available at https://github.com/JiangmeiRubyXiong/GammaGateR.
Collapse
Affiliation(s)
- Jiangmei Xiong
- Department of Biostatistics, Vanderbilt University, 2525 West End Avenue, Suite 1100, Nashville, TN 37203-1741, United States
| | - Harsimran Kaur
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, 340 Light Hall, 2215 Garland Ave, Nashville, TN 37232, United States
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
| | - Cody N Heiser
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, 340 Light Hall, 2215 Garland Ave, Nashville, TN 37232, United States
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- Regeneron Pharmaceuticals, 777 Old Saw Mill River Road, Tarrytown, NY 10591, United States
| | - Eliot T McKinley
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- GlaxoSmithKline, 410 Blackwell St, Durham, NC 27701, United States
| | - Joseph T Roland
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- Department of Surgery, Vanderbilt University Medical Center, 2215 Garland Ave Medical Research Building IV, Nashville, TN 37232, United States
| | - Robert J Coffey
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- Department of Medicine, Vanderbilt University Medical Center, 1161 21st Ave S, Nashville, TN 37232, United States
| | - Martha J Shrubsole
- Department of Medicine, Vanderbilt University Medical Center, 1161 21st Ave S, Nashville, TN 37232, United States
| | - Julia Wrobel
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Rd, Atlanta, GA 30322, United States
| | - Siyuan Ma
- Department of Biostatistics, Vanderbilt University, 2525 West End Avenue, Suite 1100, Nashville, TN 37203-1741, United States
| | - Ken S Lau
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, 340 Light Hall, 2215 Garland Ave, Nashville, TN 37232, United States
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- Regeneron Pharmaceuticals, 777 Old Saw Mill River Road, Tarrytown, NY 10591, United States
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, 10475 Medical Research Building IV, 2215 Garland Avenue, Nashville, TN 37232, United States
| | - Simon Vandekar
- Department of Biostatistics, Vanderbilt University, 2525 West End Avenue, Suite 1100, Nashville, TN 37203-1741, United States
| |
Collapse
|
7
|
Kline-Schoder AR, Chintamen S, Willner MJ, DiBenedetto MR, Noel RL, Batts AJ, Kwon N, Zacharoulis S, Wu CC, Menon V, Kernie SG, Konofagou EE. Characterization of the responses of brain macrophages to focused ultrasound-mediated blood-brain barrier opening. Nat Biomed Eng 2024; 8:650-663. [PMID: 37857722 PMCID: PMC11734153 DOI: 10.1038/s41551-023-01107-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 09/16/2023] [Indexed: 10/21/2023]
Abstract
The opening of the blood-brain barrier (BBB) by focused ultrasound (FUS) coupled with intravenously injected microbubbles can be leveraged as a form of immunotherapy for the treatment of neurodegenerative disorders. However, how FUS BBB opening affects brain macrophages is not well understood. Here by using single-cell sequencing to characterize the distinct responses of microglia and central nervous system-associated macrophages (CAMs) to FUS-mediated BBB opening in mice, we show that the treatment remodels the immune landscape via the recruitment of CAMs and the proliferation of microglia and via population size increases in disease-associated microglia. Both microglia and CAMs showed early and late increases in population sizes, yet only the proliferation of microglia increased at both timepoints. The population of disease-associated microglia also increased, accompanied by the upregulation of genes associated with gliogenesis and phagocytosis, with the depletion of brain macrophages significantly decreasing the duration of BBB opening.
Collapse
Affiliation(s)
| | - Sana Chintamen
- Department of Neurobiology and Behaviour, Columbia University, New York, NY, USA
| | - Moshe J Willner
- Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | | | - Rebecca L Noel
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
| | - Alec J Batts
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
| | - Nancy Kwon
- Department of Biomedical Engineering, Columbia University, New York, NY, USA
| | | | - Cheng-Chia Wu
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Vilas Menon
- Department of Neurology, Columbia University, New York, NY, USA
| | - Steven G Kernie
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Elisa E Konofagou
- Department of Biomedical Engineering, Columbia University, New York, NY, USA.
- Department of Radiology, Columbia University, New York, NY, USA.
| |
Collapse
|
8
|
Biswas S, Shahriar S, Bachay G, Arvanitis P, Jamoul D, Brunken WJ, Agalliu D. Glutamatergic neuronal activity regulates angiogenesis and blood-retinal barrier maturation via Norrin/β-catenin signaling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.10.548410. [PMID: 37503079 PMCID: PMC10369888 DOI: 10.1101/2023.07.10.548410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Interactions among neuronal, glial and vascular components are crucial for retinal angiogenesis and blood-retinal barrier (BRB) maturation. Although synaptic dysfunction precedes vascular abnormalities in many retinal pathologies, how neuronal activity, specifically glutamatergic activity, regulates retinal angiogenesis and BRB maturation remains unclear. Using in vivo genetic studies in mice, single-cell RNA-sequencing and functional validation, we show that deep plexus angiogenesis and paracellular BRB maturation are delayed in Vglut1 -/- retinas where neurons fail to release glutamate. In contrast, deep plexus angiogenesis and paracellular BRB maturation are accelerated in Gnat1 -/- retinas where constitutively depolarized rods release excessive glutamate. Norrin expression and endothelial Norrin/β-catenin signaling are downregulated in Vglut1 -/- retinas, and upregulated in Gnat1 -/- retinas. Pharmacological activation of endothelial Norrin/β-catenin signaling in Vglut1 -/- retinas rescued defects in deep plexus angiogenesis and paracellular BRB maturation. Our findings demonstrate that glutamatergic neuronal activity regulates retinal angiogenesis and BRB maturation by modulating endothelial Norrin/β-catenin signaling.
Collapse
|
9
|
Liu J, Zeng W, Kan S, Li M, Zheng R. CAKE: a flexible self-supervised framework for enhancing cell visualization, clustering and rare cell identification. Brief Bioinform 2023; 25:bbad475. [PMID: 38145950 PMCID: PMC10749894 DOI: 10.1093/bib/bbad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/13/2023] [Accepted: 11/30/2023] [Indexed: 12/27/2023] Open
Abstract
Single cell sequencing technology has provided unprecedented opportunities for comprehensively deciphering cell heterogeneity. Nevertheless, the high dimensionality and intricate nature of cell heterogeneity have presented substantial challenges to computational methods. Numerous novel clustering methods have been proposed to address this issue. However, none of these methods achieve the consistently better performance under different biological scenarios. In this study, we developed CAKE, a novel and scalable self-supervised clustering method, which consists of a contrastive learning model with a mixture neighborhood augmentation for cell representation learning, and a self-Knowledge Distiller model for the refinement of clustering results. These designs provide more condensed and cluster-friendly cell representations and improve the clustering performance in term of accuracy and robustness. Furthermore, in addition to accurately identifying the major type cells, CAKE could also find more biologically meaningful cell subgroups and rare cell types. The comprehensive experiments on real single-cell RNA sequencing datasets demonstrated the superiority of CAKE in visualization and clustering over other comparison methods, and indicated its extensive application in the field of cell heterogeneity analysis. Contact: Ruiqing Zheng. (rqzheng@csu.edu.cn).
Collapse
Affiliation(s)
- Jin Liu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Weixing Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Shichao Kan
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
10
|
Xiong J, Kaur H, Heiser CN, McKinley ET, Roland JT, Coffey RJ, Shrubsole MJ, Wrobel J, Ma S, Lau KS, Vandekar S. GammaGateR: semi-automated marker gating for single-cell multiplexed imaging. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.20.558645. [PMID: 37781604 PMCID: PMC10541135 DOI: 10.1101/2023.09.20.558645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
Motivation Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. Results To address the need for an evaluable semi-automated algorithm, we developed GammaGateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slide-specific model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation. Availability and Implementation The R package is available at https://github.com/JiangmeiRubyXiong/GammaGateR.
Collapse
Affiliation(s)
| | - Harsimran Kaur
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
| | - Cody N Heiser
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- Regeneron Pharmaceuticals, USA
| | - Eliot T McKinley
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- GlaxoSmithKline, USA
| | - Joseph T Roland
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- Department of Surgery, Vanderbilt University Medical Center, USA
| | - Robert J Coffey
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- Department of Medicine, Vanderbilt University Medical Center, USA
| | | | - Julia Wrobel
- Department of Biostatistics and Bioinformatics, Emory University, USA
| | - Siyuan Ma
- Department of Biostatistics, Vanderbilt University, USA
| | - Ken S Lau
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- Department of Surgery, Vanderbilt University Medical Center, USA
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, USA
| | | |
Collapse
|
11
|
Zhang C, Duan ZW, Xu YP, Liu J, Li HD. FEED: a feature selection method based on gene expression decomposition for single cell clustering. Brief Bioinform 2023; 24:bbad389. [PMID: 37935617 DOI: 10.1093/bib/bbad389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/31/2023] [Accepted: 09/22/2023] [Indexed: 11/09/2023] Open
Abstract
Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.
Collapse
Affiliation(s)
- Chao Zhang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Zhi-Wei Duan
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yun-Pei Xu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jin Liu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
12
|
Bhadani R, Chen Z, An L. Attention-Based Graph Neural Network for Label Propagation in Single-Cell Omics. Genes (Basel) 2023; 14:506. [PMID: 36833434 PMCID: PMC9957137 DOI: 10.3390/genes14020506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023] Open
Abstract
Single-cell data analysis has been at forefront of development in biology and medicine since sequencing data have been made available. An important challenge in single-cell data analysis is the identification of cell types. Several methods have been proposed for cell-type identification. However, these methods do not capture the higher-order topological relationship between different samples. In this work, we propose an attention-based graph neural network that captures the higher-order topological relationship between different samples and performs transductive learning for predicting cell types. The evaluation of our method on both simulation and publicly available datasets demonstrates the superiority of our method, scAGN, in terms of prediction accuracy. In addition, our method works best for highly sparse datasets in terms of F1 score, precision score, recall score, and Matthew's correlation coefficients as well. Further, our method's runtime complexity is consistently faster compared to other methods.
Collapse
Affiliation(s)
- Rahul Bhadani
- Department of Electrical & Computer Engineering, The University of Arizona, Tucson, AZ 85721, USA
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
| | - Zhuo Chen
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
| | - Lingling An
- Interdisciplinary Program in Statistics and Data Science, The University of Arizona, Tucson, AZ 85721, USA
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ 85721, USA
- Department of Epidemiology and Biostatistics, The University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
13
|
Richman LP, Goyal Y, Jiang CL, Raj A. ClonoCluster: A method for using clonal origin to inform transcriptome clustering. CELL GENOMICS 2023; 3:100247. [PMID: 36819662 PMCID: PMC9932990 DOI: 10.1016/j.xgen.2022.100247] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 09/22/2022] [Accepted: 12/16/2022] [Indexed: 01/13/2023]
Abstract
Clustering cells based on their high-dimensional profiles is an important data reduction process by which researchers infer distinct cellular states. The advent of cellular barcoding, however, provides an alternative means by which to group cells: by their clonal origin. We developed ClonoCluster, a computational method that combines both clone and transcriptome information to create hybrid clusters that weight both kinds of data with a tunable parameter. We generated hybrid clusters across six independent datasets and found that ClonoCluster generated qualitatively different clusters in all cases. The markers of these hybrid clusters were different but had equivalent fidelity to transcriptome-only clusters. The genes most strongly associated with the rearrangements in hybrid clusters were ribosomal function and extracellular matrix genes. We also developed the complementary tool Warp Factor that incorporates clone information in popular 2D visualization techniques like UMAP. Integrating ClonoCluster and Warp Factor revealed biologically relevant markers of cell identity.
Collapse
Affiliation(s)
- Lee P. Richman
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yogesh Goyal
- Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Department of Cell and Developmental Biology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
- Center for Synthetic Biology, Northwestern University, Chicago, IL, USA
| | - Connie L. Jiang
- Genetics and Epigenetics, Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Arjun Raj
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
14
|
Zhang L, Cervantes MD, Pan S, Lindsley J, Dabney A, Kapler GM. Transcriptome analysis of the binucleate ciliate Tetrahymena thermophila with asynchronous nuclear cell cycles. Mol Biol Cell 2023; 34:rs1. [PMID: 36475712 PMCID: PMC9930529 DOI: 10.1091/mbc.e22-08-0326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Tetrahymena thermophila harbors two functionally and physically distinct nuclei within a shared cytoplasm. During vegetative growth, the "cell cycles" of the diploid micronucleus and polyploid macronucleus are offset. Micronuclear S phase initiates just before cytokinesis and is completed in daughter cells before onset of macronuclear DNA replication. Mitotic micronuclear division occurs mid-cell cycle, while macronuclear amitosis is coupled to cell division. Here we report the first RNA-seq cell cycle analysis of a binucleated ciliated protozoan. RNA was isolated across 1.5 vegetative cell cycles, starting with a macronuclear G1 population synchronized by centrifugal elutriation. Using MetaCycle, 3244 of the 26,000+ predicted genes were shown to be cell cycle regulated. Proteins present in both nuclei exhibit a single mRNA peak that always precedes their macronuclear function. Nucleus-limited genes, including nucleoporins and importins, are expressed before their respective nucleus-specific role. Cyclin D and A/B gene family members exhibit different expression patterns that suggest nucleus-restricted roles. Periodically expressed genes cluster into seven cyclic patterns. Four clusters have known PANTHER gene ontology terms associated with G1/S and G2/M phase. We propose that these clusters encode known and novel factors that coordinate micro- and macronuclear-specific events such as mitosis, amitosis, DNA replication, and cell division.
Collapse
Affiliation(s)
- L. Zhang
- Department of Cell Biology and Genetics, Texas A&M University Health Science Center, College Station, TX 77840,Department of Statistics, Texas A&M University, College Station, TX 77843
| | - M. D. Cervantes
- Department of Cell Biology and Genetics, Texas A&M University Health Science Center, College Station, TX 77840
| | - S. Pan
- Department of Cell Biology and Genetics, Texas A&M University Health Science Center, College Station, TX 77840,Department of Statistics, Texas A&M University, College Station, TX 77843
| | - J. Lindsley
- Department of Cell Biology and Genetics, Texas A&M University Health Science Center, College Station, TX 77840
| | - A. Dabney
- Department of Statistics, Texas A&M University, College Station, TX 77843,*Address correspondence to: Geoffrey Kapler (); A. Dabney ()
| | - G. M. Kapler
- Department of Cell Biology and Genetics, Texas A&M University Health Science Center, College Station, TX 77840,*Address correspondence to: Geoffrey Kapler (); A. Dabney ()
| |
Collapse
|
15
|
Wrobel J, Harris C, Vandekar S. Statistical Analysis of Multiplex Immunofluorescence and Immunohistochemistry Imaging Data. Methods Mol Biol 2023; 2629:141-168. [PMID: 36929077 DOI: 10.1007/978-1-0716-2986-4_8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Advances in multiplexed single-cell immunofluorescence (mIF) and multiplex immunohistochemistry (mIHC) imaging technologies have enabled the analysis of cell-to-cell spatial relationships that promise to revolutionize our understanding of tissue-based diseases and autoimmune disorders. Multiplex images are collected as multichannel TIFF files; then denoised, segmented to identify cells and nuclei, normalized across slides with protein markers to correct for batch effects, and phenotyped; and then tissue composition and spatial context at the cellular level are analyzed. This chapter discusses methods and software infrastructure for image processing and statistical analysis of mIF/mIHC data.
Collapse
Affiliation(s)
- Julia Wrobel
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Coleman Harris
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Simon Vandekar
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
16
|
Shakola F, Palejev D, Ivanov I. A Framework for Comparison and Assessment of Synthetic RNA-Seq Data. Genes (Basel) 2022; 13:2362. [PMID: 36553629 PMCID: PMC9778097 DOI: 10.3390/genes13122362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 12/16/2022] Open
Abstract
The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest.
Collapse
Affiliation(s)
- Felitsiya Shakola
- GATE Institute, Sofia University, 125 Tsarigradsko Shosse, Bl. 2, 1113 Sofia, Bulgaria
| | - Dean Palejev
- Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Acad. G. Bonchev St., Bl. 8, 1113 Sofia, Bulgaria
| | - Ivan Ivanov
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
17
|
Garcia-Ramirez DL, Singh S, McGrath JR, Ha NT, Dougherty KJ. Identification of adult spinal Shox2 neuronal subpopulations based on unbiased computational clustering of electrophysiological properties. Front Neural Circuits 2022; 16:957084. [PMID: 35991345 PMCID: PMC9385948 DOI: 10.3389/fncir.2022.957084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Spinal cord neurons integrate sensory and descending information to produce motor output. The expression of transcription factors has been used to dissect out the neuronal components of circuits underlying behaviors. However, most of the canonical populations of interneurons are heterogeneous and require additional criteria to determine functional subpopulations. Neurons expressing the transcription factor Shox2 can be subclassified based on the co-expression of the transcription factor Chx10 and each subpopulation is proposed to have a distinct connectivity and different role in locomotion. Adult Shox2 neurons have recently been shown to be diverse based on their firing properties. Here, in order to subclassify adult mouse Shox2 neurons, we performed multiple analyses of data collected from whole-cell patch clamp recordings of visually-identified Shox2 neurons from lumbar spinal slices. A smaller set of Chx10 neurons was included in the analyses for validation. We performed k-means and hierarchical unbiased clustering approaches, considering electrophysiological variables. Unlike the categorizations by firing type, the clusters displayed electrophysiological properties that could differentiate between clusters of Shox2 neurons. The presence of clusters consisting exclusively of Shox2 neurons in both clustering techniques suggests that it is possible to distinguish Shox2+Chx10- neurons from Shox2+Chx10+ neurons by electrophysiological properties alone. Computational clusters were further validated by immunohistochemistry with accuracy in a small subset of neurons. Thus, unbiased cluster analysis using electrophysiological properties is a tool that can enhance current interneuronal subclassifications and can complement groupings based on transcription factor and molecular expression.
Collapse
Affiliation(s)
| | | | | | | | - Kimberly J. Dougherty
- Department of Neurobiology and Anatomy, Marion Murray Spinal Cord Research Center, Drexel University College of Medicine, Philadelphia, PA, United States
| |
Collapse
|
18
|
Cui M, Han S, Wang D, Haider MS, Guo J, Zhao Q, Du P, Sun Z, Qi F, Zheng Z, Huang B, Dong W, Li P, Zhang X. Gene Co-expression Network Analysis of the Comparative Transcriptome Identifies Hub Genes Associated With Resistance to Aspergillus flavus L. in Cultivated Peanut ( Arachis hypogaea L.). FRONTIERS IN PLANT SCIENCE 2022; 13:899177. [PMID: 35812950 PMCID: PMC9264616 DOI: 10.3389/fpls.2022.899177] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 05/06/2022] [Indexed: 06/08/2023]
Abstract
Cultivated peanut (Arachis hypogaea L.), a cosmopolitan oil crop, is susceptible to a variety of pathogens, especially Aspergillus flavus L., which not only vastly reduce the quality of peanut products but also seriously threaten food safety for the contamination of aflatoxin. However, the key genes related to resistance to Aspergillus flavus L. in peanuts remain unclear. This study identifies hub genes positively associated with resistance to A. flavus in two genotypes by comparative transcriptome and weighted gene co-expression network analysis (WGCNA) method. Compared with susceptible genotype (Zhonghua 12, S), the rapid response to A. flavus and quick preparation for the translation of resistance-related genes in the resistant genotype (J-11, R) may be the drivers of its high resistance. WGCNA analysis revealed that 18 genes encoding pathogenesis-related proteins (PR10), 1-aminocyclopropane-1-carboxylate oxidase (ACO1), MAPK kinase, serine/threonine kinase (STK), pattern recognition receptors (PRRs), cytochrome P450, SNARE protein SYP121, pectinesterase, phosphatidylinositol transfer protein, and pentatricopeptide repeat (PPR) protein play major and active roles in peanut resistance to A. flavus. Collectively, this study provides new insight into resistance to A. flavus by employing WGCNA, and the identification of hub resistance-responsive genes may contribute to the development of resistant cultivars by molecular-assisted breeding.
Collapse
Affiliation(s)
- Mengjie Cui
- College of Agriculture, Nanjing Agricultural University, Nanjing, China
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Suoyi Han
- College of Agriculture, Nanjing Agricultural University, Nanjing, China
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Du Wang
- Key Laboratory of Detection for Mycotoxins, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Wuhan, China
| | | | - Junjia Guo
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Qi Zhao
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
| | - Pei Du
- College of Agriculture, Nanjing Agricultural University, Nanjing, China
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Ziqi Sun
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Feiyan Qi
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Zheng Zheng
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Bingyan Huang
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Wenzhao Dong
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| | - Peiwu Li
- Key Laboratory of Detection for Mycotoxins, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Ministry of Agriculture and Rural Affairs, Wuhan, China
| | - Xinyou Zhang
- College of Agriculture, Nanjing Agricultural University, Nanjing, China
- The Shennong Laboratory, Henan Academy of Crops Molecular Breeding, Henan Academy of Agricultural Science, Zhengzhou, China
- Key Laboratory of Oil Crops in Huang-Huai-Hai Plains, Ministry of Agriculture, Zhengzhou, China
- Henan Provincial Key Laboratory for Oil Crop Improvement, Zhengzhou, China
- National Centre for Plant Breeding, Xinxiang, China
| |
Collapse
|
19
|
Long H, Reeves R, Simon MM. Mouse genomic and cellular annotations. Mamm Genome 2022; 33:19-30. [PMID: 35124726 PMCID: PMC8913471 DOI: 10.1007/s00335-021-09936-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 11/22/2021] [Indexed: 11/28/2022]
Abstract
AbstractMice have emerged as one of the most popular and valuable model organisms in the research of human biology. This is due to their genetic and physiological similarity to humans, short generation times, availability of genetically homologous inbred strains, and relatively easy laboratory maintenance. Therefore, following the release of the initial human reference genome, the generation of the mouse reference genome was prioritised and represented an important scientific resource for the mouse genetics community. In 2002, the Mouse Genome Sequencing Consortium published an initial draft of the mouse reference genome which contained ~ 96% of the euchromatic genome of female C57BL/6 J mice. Almost two decades on from the publication of the initial draft, sequencing efforts have continued to increase the completeness and accuracy of the C57BL/6 J reference genome alongside advances in genome annotation. Additionally new sequencing technologies have provided a wealth of data that has added to the repertoire of annotations associated with traditional genomic annotations. Including but not limited to advances in regulatory elements, the 3D genome and individual cellular states. In this review we focus on the reference genome C57BL/6 J and summarise the different aspects of genomic and cellular annotations, as well as their relevance to mouse genetic research. We denote a genomic annotation as a functional unit of the genome. Cellular annotations are annotations of cell type or state, defined by the transcriptomic expression profile of a cell. Due to the wide-ranging number and diversity of annotations describing the mouse genome, we focus on gene, repeat and regulatory element annotation as well as two relatively new technologies; 3D genome architecture and single-cell sequencing outlining their utility in genetic research and their current challenges.
Collapse
Affiliation(s)
- Helen Long
- MRC Harwell Institute, Mammalian Genetics Unit, Harwell Campus, Oxfordshire, OX11 0RD, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Richard Reeves
- MRC Harwell Institute, Mammalian Genetics Unit, Harwell Campus, Oxfordshire, OX11 0RD, UK
| | - Michelle M Simon
- MRC Harwell Institute, Mammalian Genetics Unit, Harwell Campus, Oxfordshire, OX11 0RD, UK.
| |
Collapse
|
20
|
Li YR, Zhou Y, Kim YJ, Zhu Y, Ma F, Yu J, Wang YC, Chen X, Li Z, Zeng S, Wang X, Lee D, Ku J, Tsao T, Hardoy C, Huang J, Cheng D, Montel-Hagen A, Seet CS, Crooks GM, Larson SM, Sasine JP, Wang X, Pellegrini M, Ribas A, Kohn DB, Witte O, Wang P, Yang L. Development of allogeneic HSC-engineered iNKT cells for off-the-shelf cancer immunotherapy. Cell Rep Med 2021; 2:100449. [PMID: 34841295 PMCID: PMC8607011 DOI: 10.1016/j.xcrm.2021.100449] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 08/12/2021] [Accepted: 10/19/2021] [Indexed: 01/19/2023]
Abstract
Cell-based immunotherapy has become the new-generation cancer medicine, and "off-the-shelf" cell products that can be manufactured at large scale and distributed readily to treat patients are necessary. Invariant natural killer T (iNKT) cells are ideal cell carriers for developing allogeneic cell therapy because they are powerful immune cells targeting cancers without graft-versus-host disease (GvHD) risk. However, healthy donor blood contains extremely low numbers of endogenous iNKT cells. Here, by combining hematopoietic stem cell (HSC) gene engineering and in vitro differentiation, we generate human allogeneic HSC-engineered iNKT (AlloHSC-iNKT) cells at high yield and purity; these cells closely resemble endogenous iNKT cells, effectively target tumor cells using multiple mechanisms, and exhibit high safety and low immunogenicity. These cells can be further engineered with chimeric antigen receptor (CAR) to enhance tumor targeting or/and gene edited to ablate surface human leukocyte antigen (HLA) molecules and further reduce immunogenicity. Collectively, these preclinical studies demonstrate the feasibility and cancer therapy potential of AlloHSC-iNKT cell products and lay a foundation for their translational and clinical development.
Collapse
Affiliation(s)
- Yan-Ruide Li
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yang Zhou
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yu Jeong Kim
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yanni Zhu
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Feiyang Ma
- Department of Molecular, Cell and Developmental Biology, College of Letters and Sciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jiaji Yu
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yu-Chen Wang
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xianhui Chen
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Zhe Li
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Samuel Zeng
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xi Wang
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Derek Lee
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Josh Ku
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Tasha Tsao
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Christian Hardoy
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jie Huang
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Donghui Cheng
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Amélie Montel-Hagen
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Christopher S. Seet
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Gay M. Crooks
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Pediatrics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sarah M. Larson
- Department of Internal Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Joshua P. Sasine
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Division of Hematology/Oncology, Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xiaoyan Wang
- Department of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, College of Letters and Sciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Antoni Ribas
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Parker Institute for Cancer Immunotherapy, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Donald B. Kohn
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Division of Hematology/Oncology, Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Owen Witte
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Parker Institute for Cancer Immunotherapy, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Pin Wang
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Lili Yang
- Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
21
|
Liu Y, Zhang J, Wang S, Zeng X, Zhang W. Are dropout imputation methods for scRNA-seq effective for scATAC-seq data? Brief Bioinform 2021; 23:6412397. [PMID: 34718405 DOI: 10.1093/bib/bbab442] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 09/08/2021] [Accepted: 09/27/2021] [Indexed: 11/12/2022] Open
Abstract
The tremendous progress of single-cell sequencing technology has given researchers the opportunity to study cell development and differentiation processes at single-cell resolution. Assay of Transposase-Accessible Chromatin by deep sequencing (ATAC-seq) was proposed for genome-wide analysis of chromatin accessibility. Due to technical limitations or other reasons, dropout events are almost a common occurrence for extremely sparse single-cell ATAC-seq data, leading to confusion in downstream analysis (such as clustering). Although considerable progress has been made in the estimation of scRNA-seq data, there is currently no specific method for the inference of dropout events in single-cell ATAC-seq data. In this paper, we select several state-of-the-art scRNA-seq imputation methods (including MAGIC, SAVER, scImpute, deepImpute, PRIME, bayNorm and knn-smoothing) in recent years to infer dropout peaks in scATAC-seq data, and perform a systematic evaluation of these methods through several downstream analyses. Specifically, we benchmarked these methods in terms of correlation with meta-cell, clustering, subpopulations distance analysis, imputation performance for corruption datasets, identification of TF motifs and computation time. The experimental results indicated that most of the imputed peaks increased the correlation with the reference meta-cell, while the performance of different methods on different datasets varied greatly in different downstream analyses, thus should be used with caution. In general, MAGIC performed better than the other methods most consistently across all assessments. Our source code is freely available at https://github.com/yueyueliu/scATAC-master.
Collapse
Affiliation(s)
- Yue Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Junfeng Zhang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Shulin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Wei Zhang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, Hunan 410003, China
| |
Collapse
|
22
|
Ciortan M, Defrance M. Contrastive self-supervised clustering of scRNA-seq data. BMC Bioinformatics 2021; 22:280. [PMID: 34044773 PMCID: PMC8157426 DOI: 10.1186/s12859-021-04210-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 05/10/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) has emerged has a main strategy to study transcriptional activity at the cellular level. Clustering analysis is routinely performed on scRNA-seq data to explore, recognize or discover underlying cell identities. The high dimensionality of scRNA-seq data and its significant sparsity accentuated by frequent dropout events, introducing false zero count observations, make the clustering analysis computationally challenging. Even though multiple scRNA-seq clustering techniques have been proposed, there is no consensus on the best performing approach. On a parallel research track, self-supervised contrastive learning recently achieved state-of-the-art results on images clustering and, subsequently, image classification. RESULTS We propose contrastive-sc, a new unsupervised learning method for scRNA-seq data that perform cell clustering. The method consists of two consecutive phases: first, an artificial neural network learns an embedding for each cell through a representation training phase. The embedding is then clustered in the second phase with a general clustering algorithm (i.e. KMeans or Leiden community detection). The proposed representation training phase is a new adaptation of the self-supervised contrastive learning framework, initially proposed for image processing, to scRNA-seq data. contrastive-sc has been compared with ten state-of-the-art techniques. A broad experimental study has been conducted on both simulated and real-world datasets, assessing multiple external and internal clustering performance metrics (i.e. ARI, NMI, Silhouette, Calinski scores). Our experimental analysis shows that constastive-sc compares favorably with state-of-the-art methods on both simulated and real-world datasets. CONCLUSION On average, our method identifies well-defined clusters in close agreement with ground truth annotations. Our method is computationally efficient, being fast to train and having a limited memory footprint. contrastive-sc maintains good performance when only a fraction of input cells is provided and is robust to changes in hyperparameters or network architecture. The decoupling between the creation of the embedding and the clustering phase allows the flexibility to choose a suitable clustering algorithm (i.e. KMeans when the number of expected clusters is known, Leiden otherwise) or to integrate the embedding with other existing techniques.
Collapse
Affiliation(s)
- Madalina Ciortan
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles, Brussels, Belgium
| | - Matthieu Defrance
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles, Brussels, Belgium.
| |
Collapse
|
23
|
Clarke ZA, Andrews TS, Atif J, Pouyabahar D, Innes BT, MacParland SA, Bader GD. Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat Protoc 2021; 16:2749-2764. [PMID: 34031612 DOI: 10.1038/s41596-021-00534-0] [Citation(s) in RCA: 111] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 03/12/2021] [Indexed: 11/09/2022]
Abstract
Single-cell transcriptomics can profile thousands of cells in a single experiment and identify novel cell types, states and dynamics in a wide variety of tissues and organisms. Standard experimental protocols and analysis workflows have been developed to create single-cell transcriptomic maps from tissues. This tutorial focuses on how to interpret these data to identify cell types, states and other biologically relevant patterns with the objective of creating an annotated map of cells. We recommend a three-step workflow including automatic cell annotation (wherever possible), manual cell annotation and verification. Frequently encountered challenges are discussed, as well as strategies to address them. Guiding principles and specific recommendations for software tools and resources that can be used for each step are covered, and an R notebook is included to help run the recommended workflow. Basic familiarity with computer software is assumed, and basic knowledge of programming (e.g., in the R language) is recommended.
Collapse
Affiliation(s)
- Zoe A Clarke
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Tallulah S Andrews
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.,Ajmera Transplant Centre, Toronto General Hospital Research Institute, Toronto, Ontario, Canada.,Department of Immunology, University of Toronto, Toronto, Ontario, Canada
| | - Jawairia Atif
- Ajmera Transplant Centre, Toronto General Hospital Research Institute, Toronto, Ontario, Canada.,Department of Immunology, University of Toronto, Toronto, Ontario, Canada
| | - Delaram Pouyabahar
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Brendan T Innes
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Sonya A MacParland
- Ajmera Transplant Centre, Toronto General Hospital Research Institute, Toronto, Ontario, Canada. .,Department of Immunology, University of Toronto, Toronto, Ontario, Canada. .,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. .,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada. .,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. .,Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.
| |
Collapse
|
24
|
Molecular correlates of muscle spindle and Golgi tendon organ afferents. Nat Commun 2021; 12:1451. [PMID: 33649316 PMCID: PMC7977083 DOI: 10.1038/s41467-021-21880-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 02/18/2021] [Indexed: 12/16/2022] Open
Abstract
Proprioceptive feedback mainly derives from groups Ia and II muscle spindle (MS) afferents and group Ib Golgi tendon organ (GTO) afferents, but the molecular correlates of these three afferent subtypes remain unknown. We performed single cell RNA sequencing of genetically identified adult proprioceptors and uncovered five molecularly distinct neuronal clusters. Validation of cluster-specific transcripts in dorsal root ganglia and skeletal muscle demonstrates that two of these clusters correspond to group Ia MS afferents and group Ib GTO afferent proprioceptors, respectively, and suggest that the remaining clusters could represent group II MS afferents. Lineage analysis between proprioceptor transcriptomes at different developmental stages provides evidence that proprioceptor subtype identities emerge late in development. Together, our data provide comprehensive molecular signatures for groups Ia and II MS afferents and group Ib GTO afferents, enabling genetic interrogation of the role of individual proprioceptor subtypes in regulating motor output. Coordinated movement critically depends on sensory feedback from muscle spindles (MSs) and Golgi tendon organs (GTOs) but the afferents supplying this proprioceptive feedback have remained genetically inseparable. Here the authors use single cell transcriptome analysis to reveal the molecular basis of MS (groups Ia and II) and GTO (group Ib) afferent identities in the mouse.
Collapse
|
25
|
Nayak R, Hasija Y. A hitchhiker's guide to single-cell transcriptomics and data analysis pipelines. Genomics 2021; 113:606-619. [PMID: 33485955 DOI: 10.1016/j.ygeno.2021.01.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 12/30/2020] [Accepted: 01/18/2021] [Indexed: 12/20/2022]
Abstract
Single-cell transcriptomics (SCT) is a tour de force in the era of big omics data that has led to the accumulation of massive cellular transcription data at an astounding resolution of single cells. It provides valuable insights into cells previously unachieved by bulk cell analysis and is proving crucial in uncovering cellular heterogeneity, identifying rare cell populations, distinct cell-lineage trajectories, and mechanisms involved in complex cellular processes. SCT data is highly complex and necessitates advanced statistical and computational methods for analysis. This review provides a comprehensive overview of the steps in a typical SCT workflow, starting from experimental protocol to data analysis, deliberating various pipelines used. We discuss recent trends, challenges, machine learning methods for data analysis, and future prospects. We conclude by listing the multitude of scRNA-seq data applications and how it shall revolutionize our understanding of cellular biology and diseases.
Collapse
Affiliation(s)
- Richa Nayak
- Department of Biotechnology, Delhi Technological University, Delhi 110042, India
| | - Yasha Hasija
- Department of Biotechnology, Delhi Technological University, Delhi 110042, India.
| |
Collapse
|
26
|
Klimm F, Toledo EM, Monfeuga T, Zhang F, Deane CM, Reinert G. Functional module detection through integration of single-cell RNA sequencing data with protein-protein interaction networks. BMC Genomics 2020; 21:756. [PMID: 33138772 PMCID: PMC7607865 DOI: 10.1186/s12864-020-07144-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 10/12/2020] [Indexed: 12/14/2022] Open
Abstract
Background Recent advances in single-cell RNA sequencing have allowed researchers to explore transcriptional function at a cellular level. In particular, single-cell RNA sequencing reveals that there exist clusters of cells with similar gene expression profiles, representing different transcriptional states. Results In this study, we present scPPIN, a method for integrating single-cell RNA sequencing data with protein–protein interaction networks that detects active modules in cells of different transcriptional states. We achieve this by clustering RNA-sequencing data, identifying differentially expressed genes, constructing node-weighted protein–protein interaction networks, and finding the maximum-weight connected subgraphs with an exact Steiner-tree approach. As case studies, we investigate two RNA-sequencing data sets from human liver spheroids and human adipose tissue, respectively. With scPPIN we expand the output of differential expressed genes analysis with information from protein interactions. We find that different transcriptional states have different subnetworks of the protein–protein interaction networks significantly enriched which represent biological pathways. In these pathways, scPPIN identifies proteins that are not differentially expressed but have a crucial biological function (e.g., as receptors) and therefore reveals biology beyond a standard differential expressed gene analysis. Conclusions The introduced scPPIN method can be used to systematically analyse differentially expressed genes in single-cell RNA sequencing data by integrating it with protein interaction data. The detected modules that characterise each cluster help to identify and hypothesise a biological function associated to those cells. Our analysis suggests the participation of unexpected proteins in these pathways that are undetectable from the single-cell RNA sequencing data alone. The techniques described here are applicable to other organisms and tissues. Supplementary Information The online version contains supplementary material available at (doi:10.1186/s12864-020-07144-2).
Collapse
Affiliation(s)
- Florian Klimm
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK. .,Mitochondrial Biology Unit, University of Cambridge, Cambridge, CB2 0XY, UK.
| | - Enrique M Toledo
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | - Thomas Monfeuga
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | - Fang Zhang
- Discovery Technology and Genomics, Novo Nordisk Research Centre Oxford, Oxford, OX3 7FZ, UK
| | | | - Gesine Reinert
- Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| |
Collapse
|
27
|
Srinivasan S, Leshchyk A, Johnson NT, Korkin D. A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data. RNA (NEW YORK, N.Y.) 2020; 26:1303-1319. [PMID: 32532794 PMCID: PMC7491323 DOI: 10.1261/rna.074427.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 05/22/2020] [Indexed: 05/07/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is a recent technology that enables fine-grained discovery of cellular subtypes and specific cell states. Analysis of scRNA-seq data routinely involves machine learning methods, such as feature learning, clustering, and classification, to assist in uncovering novel information from scRNA-seq data. However, current methods are not well suited to deal with the substantial amount of noise that is created by the experiments or the variation that occurs due to differences in the cells of the same type. To address this, we developed a new hybrid approach, deep unsupervised single-cell clustering (DUSC), which integrates feature generation based on a deep learning architecture by using a new technique to estimate the number of latent features, with a model-based clustering algorithm, to find a compact and informative representation of the single-cell transcriptomic data generating robust clusters. We also include a technique to estimate an efficient number of latent features in the deep learning model. Our method outperforms both classical and state-of-the-art feature learning and clustering methods, approaching the accuracy of supervised learning. We applied DUSC to a single-cell transcriptomics data set obtained from a triple-negative breast cancer tumor to identify potential cancer subclones accentuated by copy-number variation and investigate the role of clonal heterogeneity. Our method is freely available to the community and will hopefully facilitate our understanding of the cellular atlas of living organisms as well as provide the means to improve patient diagnostics and treatment.
Collapse
Affiliation(s)
- Suhas Srinivasan
- Data Science Program, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
| | - Anastasia Leshchyk
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
| | - Nathan T Johnson
- Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, Massachusetts 02115, USA
- Breast Tumor Immunology Laboratory, Dana Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Dmitry Korkin
- Data Science Program, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
- Department of Computer Science, Worcester Polytechnic Institute, Worcester, Massachusetts 01609, USA
| |
Collapse
|
28
|
Kim T, Chen IR, Lin Y, Wang AYY, Yang JYH, Yang P. Impact of similarity metrics on single-cell RNA-seq data clustering. Brief Bioinform 2020; 20:2316-2326. [PMID: 30137247 DOI: 10.1093/bib/bby076] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/01/2018] [Accepted: 08/02/2018] [Indexed: 12/16/2022] Open
Abstract
Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.
Collapse
Affiliation(s)
- Taiyun Kim
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Irene Rui Chen
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Yingxin Lin
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Andy Yi-Yang Wang
- Department of Anaesthesia, The University of Sydney Northern Clinical School, The University of Sydney, Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| | - Pengyi Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
29
|
Ye X, Zhang W, Futamura Y, Sakurai T. Detecting Interactive Gene Groups for Single-Cell RNA-Seq Data Based on Co-Expression Network Analysis and Subgraph Learning. Cells 2020; 9:cells9091938. [PMID: 32825786 PMCID: PMC7563496 DOI: 10.3390/cells9091938] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 07/17/2020] [Accepted: 08/19/2020] [Indexed: 12/22/2022] Open
Abstract
High-throughput sequencing technologies have enabled the generation of single-cell RNA-seq (scRNA-seq) data, which explore both genetic heterogeneity and phenotypic variation between cells. Some methods have been proposed to detect the related genes causing cell-to-cell variability for understanding tumor heterogeneity. However, most existing methods detect the related genes separately, without considering gene interactions. In this paper, we proposed a novel learning framework to detect the interactive gene groups for scRNA-seq data based on co-expression network analysis and subgraph learning. We first utilized spectral clustering to identify the subpopulations of cells. For each cell subpopulation, the differentially expressed genes were then selected to construct a gene co-expression network. Finally, the interactive gene groups were detected by learning the dense subgraphs embedded in the gene co-expression networks. We applied the proposed learning framework on a real cancer scRNA-seq dataset to detect interactive gene groups of different cancer subtypes. Systematic gene ontology enrichment analysis was performed to examine the detected genes groups by summarizing the key biological processes and pathways. Our analysis shows that different subtypes exhibit distinct gene co-expression networks and interactive gene groups with different functional enrichment. The interactive genes are expected to yield important references for understanding tumor heterogeneity.
Collapse
|
30
|
Zheng X, Huang Y, Zou X. scPADGRN: A preconditioned ADMM approach for reconstructing dynamic gene regulatory network using single-cell RNA sequencing data. PLoS Comput Biol 2020; 16:e1007471. [PMID: 32716923 PMCID: PMC7410337 DOI: 10.1371/journal.pcbi.1007471] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 08/06/2020] [Accepted: 05/28/2020] [Indexed: 12/23/2022] Open
Abstract
Disease development and cell differentiation both involve dynamic changes; therefore, the reconstruction of dynamic gene regulatory networks (DGRNs) is an important but difficult problem in systems biology. With recent technical advances in single-cell RNA sequencing (scRNA-seq), large volumes of scRNA-seq data are being obtained for various processes. However, most current methods of inferring DGRNs from bulk samples may not be suitable for scRNA-seq data. In this work, we present scPADGRN, a novel DGRN inference method using “time-series” scRNA-seq data. scPADGRN combines the preconditioned alternating direction method of multipliers with cell clustering for DGRN reconstruction. It exhibits advantages in accuracy, robustness and fast convergence. Moreover, a quantitative index called Differentiation Genes’ Interaction Enrichment (DGIE) is presented to quantify the interaction enrichment of genes related to differentiation. From the DGIE scores of relevant subnetworks, we infer that the functions of embryonic stem (ES) cells are most active initially and may gradually fade over time. The communication strength of known contributing genes that facilitate cell differentiation increases from ES cells to terminally differentiated cells. We also identify several genes responsible for the changes in the DGIE scores occurring during cell differentiation based on three real single-cell datasets. Our results demonstrate that single-cell analyses based on network inference coupled with quantitative computations can reveal key transcriptional regulators involved in cell differentiation and disease development. Single-cell RNA sequencing (scRNA-seq) data are gaining popularity for providing access to cell-level measurements. Currently, time-series scRNA-seq data allow researchers to study dynamic changes during biological processes. This work proposes a novel method, scPADGRN, for application to time-series scRNA-seq data to construct dynamic gene regulatory networks, which are informative for investigating dynamic changes during disease development and cell differentiation. The proposed method shows satisfactory performance on both simulated data and three real datasets concerning cell differentiation. To quantify network dynamics, we present a quantitative index, DGIE, to measure the degree of activity of a certain set of genes in a regulatory network. Quantitative computations based on dynamic networks identify key regulators in cell differentiation and reveal the activity states of the identified regulators. Specifically, Bhlhe40, Msx2, Foxa2 and Dnmt3l might be important regulatory genes involved in differentiation from mouse ES cells to primitive endoderm (PrE) cells. For differentiation from mouse embryonic fibroblast cells to myocytes, Scx, Fos and Tcf12 are suggested to be key regulators. Sox5, Meis2, Hoxb3, Tcf7l1 and Plagl1 critically contribute during differentiation from human ES cells to definitive endoderm cells. These results may guide further theoretical and experimental efforts to understand cell differentiation processes and explore cell heterogeneity.
Collapse
Affiliation(s)
- Xiao Zheng
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
| | - Yuan Huang
- Department of Biostatistics, Yale University, New Haven, Connecticut, United States of America
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan, Hubei, China
- * E-mail:
| |
Collapse
|
31
|
Peyvandipour A, Shafi A, Saberian N, Draghici S. Identification of cell types from single cell data using stable clustering. Sci Rep 2020; 10:12349. [PMID: 32703984 PMCID: PMC7378075 DOI: 10.1038/s41598-020-66848-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 05/17/2020] [Indexed: 12/26/2022] Open
Abstract
Single-cell RNA-seq (scRNASeq) has become a powerful technique for measuring the transcriptome of individual cells. Unlike the bulk measurements that average the gene expressions over the individual cells, gene measurements at individual cells can be used to study several different tissues and organs at different stages. Identifying the cell types present in the sample from the single cell transcriptome data is a common goal in many single-cell experiments. Several methods have been developed to do this. However, correctly identifying the true cell types remains a challenge. We present a framework that addresses this problem. Our hypothesis is that the meaningful characteristics of the data will remain despite small perturbations of data. We validate the performance of the proposed method on eight publicly available scRNA-seq datasets with known cell types as well as five simulation datasets with different degrees of the cluster separability. We compare the proposed method with five other existing methods: RaceID, SNN-Cliq, SINCERA, SEURAT, and SC3. The results show that the proposed method performs better than the existing methods.
Collapse
Affiliation(s)
- Azam Peyvandipour
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Adib Shafi
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Nafiseh Saberian
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, USA.
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA.
| |
Collapse
|
32
|
Gao M, Ling M, Tang X, Wang S, Xiao X, Qiao Y, Yang W, Yu R. Comparison of high-throughput single-cell RNA sequencing data processing pipelines. Brief Bioinform 2020; 22:5868074. [PMID: 34020539 DOI: 10.1093/bib/bbaa116] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Revised: 04/23/2020] [Accepted: 05/17/2020] [Indexed: 11/13/2022] Open
Abstract
With the development of single-cell RNA sequencing (scRNA-seq) technology, it has become possible to perform large-scale transcript profiling for tens of thousands of cells in a single experiment. Many analysis pipelines have been developed for data generated from different high-throughput scRNA-seq platforms, bringing a new challenge to users to choose a proper workflow that is efficient, robust and reliable for a specific sequencing platform. Moreover, as the amount of public scRNA-seq data has increased rapidly, integrated analysis of scRNA-seq data from different sources has become increasingly popular. However, it remains unclear whether such integrated analysis would be biassed if the data were processed by different upstream pipelines. In this study, we encapsulated seven existing high-throughput scRNA-seq data processing pipelines with Nextflow, a general integrative workflow management framework, and evaluated their performance in terms of running time, computational resource consumption and data analysis consistency using eight public datasets generated from five different high-throughput scRNA-seq platforms. Our work provides a useful guideline for the selection of scRNA-seq data processing pipelines based on their performance on different real datasets. In addition, these guidelines can serve as a performance evaluation framework for future developments in high-throughput scRNA-seq data processing.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Rongshan Yu
- Digital Fujian Institute of Healthcare and Biomedical Big Data, School of Informatic, Xiamen University
| |
Collapse
|
33
|
Gupta S, Witas R, Voigt A, Semenova T, Nguyen CQ. Single-Cell Sequencing of T cell Receptors: A Perspective on the Technological Development and Translational Application. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2020; 1255:29-50. [PMID: 32949388 DOI: 10.1007/978-981-15-4494-1_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
T cells recognize peptides bound to major histocompatibility complex (MHC) class I and class II molecules at the cell surface. This recognition is accomplished by the expression of T cell receptors (TCR) which are required to be diverse and adaptable in order to accommodate the various and vast number of antigens presented on the MHCs. Thus, determining TCR repertoires of effector T cells is necessary to understand the immunological process in responding to cancer progression, infection, and autoimmune development. Furthermore, understanding the TCR repertoires will provide a solid framework to predict and test the antigen which is more critical in autoimmunity. However, it has been a technical challenge to sequence the TCRs and provide a conceptual context in correlation to the vast number of TCR repertoires in the immunological system. The exploding field of single-cell sequencing has changed how the repertoires are being investigated and analyzed. In this review, we focus on the biology of TCRs, TCR signaling and its implication in autoimmunity. We discuss important methods in bulk sequencing of many cells. Lastly, we explore the most pertinent platforms in single-cell sequencing and its application in autoimmunity.
Collapse
Affiliation(s)
- Shivai Gupta
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, Gainesville, FL, USA
| | - Richard Witas
- Department of Oral Biology, College of Dentistry, Gainesville, FL, USA
| | - Alexandria Voigt
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, Gainesville, FL, USA
| | - Touyana Semenova
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, Gainesville, FL, USA
| | - Cuong Q Nguyen
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, Gainesville, FL, USA. .,Department of Oral Biology, College of Dentistry, Gainesville, FL, USA. .,Center of Orphaned Autoimmune Diseases, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
34
|
Prodromidou K, Matsas R. Species-Specific miRNAs in Human Brain Development and Disease. Front Cell Neurosci 2019; 13:559. [PMID: 31920559 PMCID: PMC6930153 DOI: 10.3389/fncel.2019.00559] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 12/04/2019] [Indexed: 12/20/2022] Open
Abstract
Identification of the unique features of human brain development and function can be critical towards the elucidation of intricate processes such as higher cognitive functions and human-specific pathologies like neuropsychiatric and behavioral disorders. The developing primate and human central nervous system (CNS) are distinguished by expanded progenitor zones and a protracted time course of neurogenesis, leading to the expansion in brain size, prominent gyral anatomy, distinctive synaptic properties, and complex neural circuits. Comparative genomic studies have revealed that adaptations of brain capacities may be partly explained by human-specific genetic changes that impact the function of proteins associated with neocortical expansion, synaptic function, and language development. However, the formation of complex gene networks may be most relevant for brain evolution. Indeed, recent studies identified distinct human-specific gene expression patterns across developmental time occurring in brain regions linked to cognition. Interestingly, such modules show species-specific divergence and are enriched in genes associated with neuronal development and synapse formation whilst also being implicated in neuropsychiatric diseases. microRNAs represent a powerful component of gene-regulatory networks by promoting spatiotemporal post-transcriptional control of gene expression in the human and primate brain. It has also been suggested that the divergence in miRNA expression plays an important role in shaping gene expression divergence among species. Primate-specific and human-specific miRNAs are principally involved in progenitor proliferation and neurogenic processes but also associate with human cognition, and neurological disorders. Human embryonic or induced pluripotent stem cells and brain organoids, permitting experimental access to neural cells and differentiation stages that are otherwise difficult or impossible to reach in humans, are an essential means for studying species-specific brain miRNAs. Single-cell sequencing approaches can further decode refined miRNA-mRNA interactions during developmental transitions. Elucidating species-specific miRNA regulation will shed new light into the mechanisms that control spatiotemporal events during human brain development and disease, an important step towards fostering novel, holistic and effective therapeutic approaches for neural disorders. In this review, we discuss species-specific regulation of miRNA function, its contribution to the evolving features of the human brain and in neurological disease, with respect also to future therapeutic approaches.
Collapse
Affiliation(s)
- Kanella Prodromidou
- Laboratory of Cellular and Molecular Neurobiology-Stem Cells, Department of Neurobiology, Hellenic Pasteur Institute, Athens, Greece
| | - Rebecca Matsas
- Laboratory of Cellular and Molecular Neurobiology-Stem Cells, Department of Neurobiology, Hellenic Pasteur Institute, Athens, Greece
| |
Collapse
|
35
|
Xiong L, Xu K, Tian K, Shao Y, Tang L, Gao G, Zhang M, Jiang T, Zhang QC. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat Commun 2019; 10:4576. [PMID: 31594952 PMCID: PMC6783552 DOI: 10.1038/s41467-019-12630-7] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 09/20/2019] [Indexed: 12/31/2022] Open
Abstract
Single-cell ATAC-seq (scATAC-seq) profiles the chromatin accessibility landscape at single cell level, thus revealing cell-to-cell variability in gene regulation. However, the high dimensionality and sparsity of scATAC-seq data often complicate the analysis. Here, we introduce a method for analyzing scATAC-seq data, called Single-Cell ATAC-seq analysis via Latent feature Extraction (SCALE). SCALE combines a deep generative framework and a probabilistic Gaussian Mixture Model to learn latent features that accurately characterize scATAC-seq data. We validate SCALE on datasets generated on different platforms with different protocols, and having different overall data qualities. SCALE substantially outperforms the other tools in all aspects of scATAC-seq data analysis, including visualization, clustering, and denoising and imputation. Importantly, SCALE also generates interpretable features that directly link to cell populations, and can potentially reveal batch effects in scATAC-seq experiments.
Collapse
Affiliation(s)
- Lei Xiong
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology, Center for Synthetic and Systems Biology, Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, 100084, Beijing, China
| | - Kui Xu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology, Center for Synthetic and Systems Biology, Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, 100084, Beijing, China
| | - Kang Tian
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology, Center for Synthetic and Systems Biology, Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, 100084, Beijing, China
| | - Yanqiu Shao
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology, Center for Synthetic and Systems Biology, Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, 100084, Beijing, China
| | - Lei Tang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology, Center for Synthetic and Systems Biology, Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, 100084, Beijing, China
| | - Ge Gao
- Beijing Advanced Innovation Center for Genomics (ICG), Biomedical Pioneering Innovation Center (BIOPIC), Peking University, 100871, Beijing, China
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, 100871, Beijing, China
| | - Michael Zhang
- Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, 100084, Beijing, China
- Department of Biological Sciences, Center for Systems Biology, The University of Texas, Dallas 800 West Campbell Road, RL11, Richardson, TX, 75080-3021, USA
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Medicine, Tsinghua University, 100084, Beijing, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA, 92521, USA
- Bioinformatics Division, BNRIST; Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology, Center for Synthetic and Systems Biology, Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, 100084, Beijing, China.
| |
Collapse
|
36
|
Suner A. Clustering methods for single-cell RNA-sequencing expression data: performance evaluation with varying sample sizes and cell compositions. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.2019.18.issue-5/sagmb-2019-0004/sagmb-2019-0004.xml. [PMID: 31646845 DOI: 10.1515/sagmb-2019-0004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A number of specialized clustering methods have been developed so far for the accurate analysis of single-cell RNA-sequencing (scRNA-seq) expression data, and several reports have been published documenting the performance measures of these clustering methods under different conditions. However, to date, there are no available studies regarding the systematic evaluation of the performance measures of the clustering methods taking into consideration the sample size and cell composition of a given scRNA-seq dataset. Herein, a comprehensive performance evaluation study of 11 selected scRNA-seq clustering methods was performed using synthetic datasets with known sample sizes and number of subpopulations, as well as varying levels of transcriptome complexity. The results indicate that the overall performance of the clustering methods under study are highly dependent on the sample size and complexity of the scRNA-seq dataset. In most of the cases, better clustering performances were obtained as the number of cells in a given expression dataset was increased. The findings of this study also highlight the importance of sample size for the successful detection of rare cell subpopulations with an appropriate clustering tool.
Collapse
Affiliation(s)
- Aslı Suner
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Ege University, Bornova, İzmir, Turkey
| |
Collapse
|
37
|
Zhang Q, Caudle WM, Pi J, Bhattacharya S, Andersen ME, Kaminski NE, Conolly RB. Embracing Systems Toxicology at Single-Cell Resolution. CURRENT OPINION IN TOXICOLOGY 2019; 16:49-57. [PMID: 31768481 PMCID: PMC6876623 DOI: 10.1016/j.cotox.2019.04.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
As systems biology expands its multi-omic spectrum to increasing resolutions, distinguishing cells based on single-cell profiles becomes feasible. Unlike traditional bulk assays that average cellular responses and blur the distinct identities of responsive cells, single-cell technologies enable sensitive detection of small cellular changes and precise identification of those cells perturbed by toxicants. Among the suite of omic technologies that continue to expand and become affordable, single-cell RNA sequencing (scRNA-seq) is at the cutting edge and leading the way to transform systems toxicology. Single-cell systems toxicology can provide a wealth of information to elucidate cell-specific alterations and response trajectories, detect points-of-departure, map and develop dynamical models of toxicity pathways.
Collapse
Affiliation(s)
- Qiang Zhang
- Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - W. Michael Caudle
- Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Jingbo Pi
- Program of Environmental Toxicology, School of Public Health, China Medical University, Shenyang, China
| | - Sudin Bhattacharya
- Department of Biomedical Engineering, Department of Pharmacology and Toxicology, Center for Research on Ingredient Safety, Institute for Quantitative Health Science and Engineering, and Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| | | | - Norbert E. Kaminski
- Departments of Pharmacology and Toxicology and Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| | - Rory B. Conolly
- Integrated Systems Toxicology Division, National Health and Environmental Effects Research Laboratory, United States Environmental Protection Agency, Durham, North Carolina, USA
| |
Collapse
|
38
|
Single-cell transcriptomics as a framework and roadmap for understanding the brain. J Neurosci Methods 2019; 326:108353. [PMID: 31351971 DOI: 10.1016/j.jneumeth.2019.108353] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Revised: 07/05/2019] [Accepted: 07/07/2019] [Indexed: 12/31/2022]
Abstract
A framework for interpreting and guiding experimental examination of the brain is essential for neuroscience. Recently, single-cell RNA sequencing and single-molecule fluorescent in situ hybridization have emerged as key technologies to generate such a framework at a single-cell resolution. These technologies provide a powerful complement for understanding gene expression in the brain: RNA sequencing enables genome-wide high-throughput quantification of gene expression, and in situ hybridization yields spatial registration of gene expression at a cellular resolution. Here, I discuss the insight that each of these technologies individually provide, and how they can be paired in principle and practice to resolve the cell-type-specific spatial organization of the brain. I further discuss the potential of cutting-edge spatial transcriptomics technologies that leverage the advantages of both techniques within the same assay, as well as how transcriptomic assays can be linked with higher-order features of brain structure and function. Such current and forthcoming transcriptomic technologies will have immense impact in generating an underlying logic of the nervous system, and will guide experiments and interpretations across molecular, cellular, circuit, and behavioural neuroscience.
Collapse
|
39
|
Abstract
Single-cell RNA sequencing (scRNA-seq) allows researchers to collect large catalogues detailing the transcriptomes of individual cells. Unsupervised clustering is of central importance for the analysis of these data, as it is used to identify putative cell types. However, there are many challenges involved. We discuss why clustering is a challenging problem from a computational point of view and what aspects of the data make it challenging. We also consider the difficulties related to the biological interpretation and annotation of the identified clusters.
Collapse
|
40
|
Deng C, Daley T, De Sena Brandine G, Smith AD. Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021339] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.
Collapse
Affiliation(s)
- Chao Deng
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Timothy Daley
- Department of Statistics and Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | - Guilherme De Sena Brandine
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Andrew D. Smith
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
41
|
Zeng T, Dai H. Single-Cell RNA Sequencing-Based Computational Analysis to Describe Disease Heterogeneity. Front Genet 2019; 10:629. [PMID: 31354786 PMCID: PMC6640157 DOI: 10.3389/fgene.2019.00629] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 06/17/2019] [Indexed: 12/25/2022] Open
Abstract
The trillions of cells in the human body can be viewed as elementary but essential biological units that achieve different body states, but the low resolution of previous cell isolation and measurement approaches limits our understanding of the cell-specific molecular profiles. The recent establishment and rapid growth of single-cell sequencing technology has facilitated the identification of molecular profiles of heterogeneous cells, especially on the transcription level of single cells [single-cell RNA sequencing (scRNA-seq)]. As a novel method, the robustness of scRNA-seq under changing conditions will determine its practical potential in major research programs and clinical applications. In this review, we first briefly presented the scRNA-seq-related methods from the point of view of experiments and computation. Then, we compared several state-of-the-art scRNA-seq analysis frameworks mainly by analyzing their performance robustness on independent scRNA-seq datasets for the same complex disease. Finally, we elaborated on our hypothesis on consensus scRNA-seq analysis and summarized the potential indicative and predictive roles of individual cells in understanding disease heterogeneity by single-cell technologies.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
42
|
Mateus ID, Masclaux FG, Aletti C, Rojas EC, Savary R, Dupuis C, Sanders IR. Dual RNA-seq reveals large-scale non-conserved genotype × genotype-specific genetic reprograming and molecular crosstalk in the mycorrhizal symbiosis. THE ISME JOURNAL 2019; 13:1226-1238. [PMID: 30647457 PMCID: PMC6474227 DOI: 10.1038/s41396-018-0342-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 11/05/2018] [Accepted: 12/11/2018] [Indexed: 01/19/2023]
Abstract
Arbuscular mycorrhizal fungi (AMF) impact plant growth and are a major driver of plant diversity and productivity. We quantified the contribution of intra-specific genetic variability in cassava (Manihot esculenta) and Rhizophagus irregularis to gene reprogramming in symbioses using dual RNA-sequencing. A large number of cassava genes exhibited altered transcriptional responses to the fungus but transcription of most of these plant genes (72%) responded in a different direction or magnitude depending on the plant genotype. Two AMF isolates displayed large differences in their transcription, but the direction and magnitude of the transcriptional responses for a large number of these genes was also strongly influenced by the genotype of the plant host. This indicates that unlike the highly conserved plant genes necessary for the symbiosis establishment, most of the plant and fungal gene transcriptional responses are not conserved and are greatly influenced by plant and fungal genetic differences, even at the within-species level. The transcriptional variability detected allowed us to identify an extensive gene network showing the interplay in plant-fungal reprogramming in the symbiosis. Key genes illustrated that the two organisms jointly program their cytoskeleton organization during growth of the fungus inside roots. Our study reveals that plant and fungal genetic variation has a strong role in shaping the genetic reprograming in response to symbiosis, indicating considerable genotype × genotype interactions in the mycorrhizal symbiosis. Such variation needs to be considered in order to understand the molecular mechanisms between AMF and their plant hosts in natural communities.
Collapse
Affiliation(s)
- Ivan D Mateus
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, 1015, Lausanne, Switzerland
| | - Frédéric G Masclaux
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, 1015, Lausanne, Switzerland
- Vital-IT, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Consolée Aletti
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, 1015, Lausanne, Switzerland
| | - Edward C Rojas
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, 1015, Lausanne, Switzerland
| | - Romain Savary
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, 1015, Lausanne, Switzerland
| | - Cindy Dupuis
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, 1015, Lausanne, Switzerland
| | - Ian R Sanders
- Department of Ecology and Evolution, University of Lausanne, Biophore Building, 1015, Lausanne, Switzerland.
| |
Collapse
|
43
|
Bhaduri A, Nowakowski TJ, Pollen AA, Kriegstein AR. Identification of cell types in a mouse brain single-cell atlas using low sampling coverage. BMC Biol 2018; 16:113. [PMID: 30309354 PMCID: PMC6180488 DOI: 10.1186/s12915-018-0580-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 09/25/2018] [Indexed: 02/06/2023] Open
Abstract
Background High throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. However, the efficient generation of such atlases will depend on sufficient sampling of diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. Results To examine the relationship between sampled cell numbers and transcriptional heterogeneity in the context of unbiased cell type classification, we explored the population structure of a publicly available 1.3 million cell dataset from E18.5 mouse brain and validated our findings in published data from adult mice. We propose a computational framework for inferring the saturation point of cluster discovery in a single-cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a “complexity index,” which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether the detected biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells than the originally sampled, though technical saturation of rare populations such as Cajal-Retzius cells is not achieved. We additionally validated these findings with a recently published atlas of cell types across mouse organs and again find using subsampling that a much smaller number of cells recapitulates the cluster distinctions of the complete dataset. Conclusions Together, these findings suggest that most of the biologically interpretable cell types from the 1.3 million cell database can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high “cellular coverage,” cell atlas studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage and then further enriching for populations of interest. This strategy is ideal for scenarios where cost and time are limited, though extremely rare populations of interest (< 1%) may be identifiable only with much higher cell numbers. Electronic supplementary material The online version of this article (10.1186/s12915-018-0580-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aparna Bhaduri
- Department of Neurology, UCSF, San Francisco, USA. .,The Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research at UCSF, San Francisco, USA.
| | - Tomasz J Nowakowski
- Department of Neurology, UCSF, San Francisco, USA.,The Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research at UCSF, San Francisco, USA.,Department of Anatomy, UCSF, San Francisco, USA
| | - Alex A Pollen
- Department of Neurology, UCSF, San Francisco, USA.,The Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research at UCSF, San Francisco, USA
| | - Arnold R Kriegstein
- Department of Neurology, UCSF, San Francisco, USA. .,The Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research at UCSF, San Francisco, USA.
| |
Collapse
|
44
|
Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018; 7:1141. [PMID: 30271584 DOI: 10.12688/f1000research.15666.1] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2018] [Indexed: 12/21/2022] Open
Abstract
Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub ( https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor ( https://bioconductor.org/packages/DuoClustering2018).
Collapse
Affiliation(s)
- Angelo Duò
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| |
Collapse
|
45
|
Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018; 7:1141. [PMID: 30271584 PMCID: PMC6134335 DOI: 10.12688/f1000research.15666.3] [Citation(s) in RCA: 133] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/04/2020] [Indexed: 02/05/2023] Open
Abstract
Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub (
https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor (
https://bioconductor.org/packages/DuoClustering2018).
Collapse
Affiliation(s)
- Angelo Duò
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| |
Collapse
|
46
|
Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018; 7:1141. [PMID: 30271584 DOI: 10.12688/f1000research.15666.2] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/31/2018] [Indexed: 12/31/2022] Open
Abstract
Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub ( https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor ( https://bioconductor.org/packages/DuoClustering2018).
Collapse
Affiliation(s)
- Angelo Duò
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| |
Collapse
|