1
|
Gao CF, Vaikuntanathan S, Riesenfeld SJ. Dissection and integration of bursty transcriptional dynamics for complex systems. Proc Natl Acad Sci U S A 2024; 121:e2306901121. [PMID: 38669186 PMCID: PMC11067469 DOI: 10.1073/pnas.2306901121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 03/06/2024] [Indexed: 04/28/2024] Open
Abstract
RNA velocity estimation is a potentially powerful tool to reveal the directionality of transcriptional changes in single-cell RNA-sequencing data, but it lacks accuracy, absent advanced metabolic labeling techniques. We developed an approach, TopicVelo, that disentangles simultaneous, yet distinct, dynamics by using a probabilistic topic model, a highly interpretable form of latent space factorization, to infer cells and genes associated with individual processes, thereby capturing cellular pluripotency or multifaceted functionality. Focusing on process-associated cells and genes enables accurate estimation of process-specific velocities via a master equation for a transcriptional burst model accounting for intrinsic stochasticity. The method obtains a global transition matrix by leveraging cell topic weights to integrate process-specific signals. In challenging systems, this method accurately recovers complex transitions and terminal states, while our use of first-passage time analysis provides insights into transient transitions. These results expand the limits of RNA velocity, empowering future studies of cell fate and functional responses.
Collapse
Affiliation(s)
- Cheng Frank Gao
- Department of Chemistry, University of Chicago, Chicago, IL60637
| | - Suriyanarayanan Vaikuntanathan
- Department of Chemistry, University of Chicago, Chicago, IL60637
- Institute for Biophysical Dynamics, University of Chicago, Chicago, IL60637
| | - Samantha J. Riesenfeld
- Institute for Biophysical Dynamics, University of Chicago, Chicago, IL60637
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL60637
- Department of Medicine, University of Chicago, Chicago, IL60637
- Committee on Immunology, Biological Sciences Division, University of Chicago, Chicago, IL60637
| |
Collapse
|
2
|
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. BEERS2: RNA-Seq simulation through high fidelity in silico modeling. Brief Bioinform 2024; 25:bbae164. [PMID: 38605641 PMCID: PMC11009461 DOI: 10.1093/bib/bbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 01/26/2024] [Accepted: 03/26/2024] [Indexed: 04/13/2024] Open
Abstract
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Dimitra Sarantopoulou
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Soumyashant Nayak
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, Karnataka, India
| | - Amruta Naik
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shaon Sengupta
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Peter S Choi
- Division of Cancer Pathobiology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
3
|
Chamberlin JT, Lee Y, Marth GT, Quinlan AR. Differences in molecular sampling and data processing explain variation among single-cell and single-nucleus RNA-seq experiments. Genome Res 2024; 34:179-188. [PMID: 38355308 PMCID: PMC10984380 DOI: 10.1101/gr.278253.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024]
Abstract
A mechanistic understanding of the biological and technical factors that impact transcript measurements is essential to designing and analyzing single-cell and single-nucleus RNA sequencing experiments. Nuclei contain the same pre-mRNA population as cells, but they contain a small subset of the mRNAs. Nonetheless, early studies argued that single-nucleus analysis yielded results comparable to cellular samples if pre-mRNA measurements were included. However, typical workflows do not distinguish between pre-mRNA and mRNA when estimating gene expression, and variation in their relative abundances across cell types has received limited attention. These gaps are especially important given that incorporating pre-mRNA has become commonplace for both assays, despite known gene length bias in pre-mRNA capture. Here, we reanalyze public data sets from mouse and human to describe the mechanisms and contrasting effects of mRNA and pre-mRNA sampling on gene expression and marker gene selection in single-cell and single-nucleus RNA-seq. We show that pre-mRNA levels vary considerably among cell types, which mediates the degree of gene length bias and limits the generalizability of a recently published normalization method intended to correct for this bias. As an alternative, we repurpose an existing post hoc gene length-based correction method from conventional RNA-seq gene set enrichment analysis. Finally, we show that inclusion of pre-mRNA in bioinformatic processing can impart a larger effect than assay choice itself, which is pivotal to the effective reuse of existing data. These analyses advance our understanding of the sources of variation in single-cell and single-nucleus RNA-seq experiments and provide useful guidance for future studies.
Collapse
Affiliation(s)
- John T Chamberlin
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84108, USA
| | - Younghee Lee
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84108, USA
- Seoul National University, College of Veterinary Medicine, Seoul, 08826, South Korea
| | - Gabor T Marth
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, Utah 84112, USA
| | - Aaron R Quinlan
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah 84108, USA;
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, Utah 84112, USA
| |
Collapse
|
4
|
Li J, Pan X, Yuan Y, Shen HB. TFvelo: gene regulation inspired RNA velocity estimation. Nat Commun 2024; 15:1387. [PMID: 38360714 DOI: 10.1038/s41467-024-45661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 01/30/2024] [Indexed: 02/17/2024] Open
Abstract
RNA velocity is closely related with cell fate and is an important indicator for the prediction of cell states with elegant physical explanation derived from single-cell RNA-seq data. Most existing RNA velocity models aim to extract dynamics from the phase delay between unspliced and spliced mRNA for each individual gene. However, unspliced/spliced mRNA abundance may not provide sufficient signal for dynamic modeling, leading to poor fit in phase portraits. Motivated by the idea that RNA velocity could be driven by the transcriptional regulation, we propose TFvelo, which expands RNA velocity concept to various single-cell datasets without relying on splicing information, by introducing gene regulatory information. Our experiments on synthetic data and multiple scRNA-Seq datasets show that TFvelo can accurately fit genes dynamics on phase portraits, and effectively infer cell pseudo-time and trajectory from RNA abundance data. TFvelo opens a robust and accurate avenue for modeling RNA velocity for single cell data.
Collapse
Affiliation(s)
- Jiachen Li
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
5
|
Harkany T, Tretiakov E, Varela L, Jarc J, Rebernik P, Newbold S, Keimpema E, Verkhratsky A, Horvath T, Romanov R. Molecularly stratified hypothalamic astrocytes are cellular foci for obesity. RESEARCH SQUARE 2024:rs.3.rs-3748581. [PMID: 38405925 PMCID: PMC10889077 DOI: 10.21203/rs.3.rs-3748581/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Astrocytes safeguard the homeostasis of the central nervous system1,2. Despite their prominent morphological plasticity under conditions that challenge the brain's adaptive capacity3-5, the classification of astrocytes, and relating their molecular make-up to spatially devolved neuronal operations that specify behavior or metabolism, remained mostly futile6,7. Although it seems unexpected in the era of single-cell biology, the lack of a major advance in stratifying astrocytes under physiological conditions rests on the incompatibility of 'neurocentric' algorithms that rely on stable developmental endpoints, lifelong transcriptional, neurotransmitter, and neuropeptide signatures for classification6-8 with the dynamic functional states, anatomic allocation, and allostatic plasticity of astrocytes1. Simplistically, therefore, astrocytes are still grouped as 'resting' vs. 'reactive', the latter referring to pathological states marked by various inducible genes3,9,10. Here, we introduced a machine learning-based feature recognition algorithm that benefits from the cumulative power of published single-cell RNA-seq data on astrocytes as a reference map to stepwise eliminate pleiotropic and inducible cellular features. For the healthy hypothalamus, this walk-back approach revealed gene regulatory networks (GRNs) that specified subsets of astrocytes, and could be used as landmarking tools for their anatomical assignment. The core molecular censuses retained by astrocyte subsets were sufficient to stratify them by allostatic competence, chiefly their signaling and metabolic interplay with neurons. Particularly, we found differentially expressed mitochondrial genes in insulin-sensing astrocytes and demonstrated their reciprocal signaling with neurons that work antagonistically within the food intake circuitry. As a proof-of-concept, we showed that disrupting Mfn2 expression in astrocytes reduced their ability to support dynamic circuit reorganization, a time-locked feature of satiety in the hypothalamus, thus leading to obesity in mice. Overall, our results suggest that astrocytes in the healthy brain are fundamentally more heterogeneous than previously thought and topologically mirror the specificity of local neurocircuits.
Collapse
Affiliation(s)
- Tibor Harkany
- Center for Brain Research, Medical University of Vienna
| | | | | | - Jasna Jarc
- Center for Brain Research, Medical University of Vienna
| | | | | | - Erik Keimpema
- Medical University of Vienna, Center for Brain Research
| | | | | | | |
Collapse
|
6
|
He D, Gao Y, Chan SS, Quintana-Parrilla N, Patro R. Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.577813. [PMID: 38370848 PMCID: PMC10871212 DOI: 10.1101/2024.02.01.577813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Motivation Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty in even the seemingly straightforward task of elucidating the splicing status of the molecules from which sequenced fragments are drawn. This difficulty arises, in part, from the limited read length and positional biases, which substantially reduce the specificity of the sequenced fragments. As a result, the splicing status of many reads in scRNA-seq is ambiguous because of a lack of definitive evidence. We are therefore in need of methods that can recover the splicing status of ambiguous reads which, in turn, can lead to more accuracy and confidence in downstream analyses. Results We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of reads and identify the true gene origin of multi-gene mapped reads. Availability Forseti and the code used for producing the results are available at https://github.com/COMBINE-lab/forseti under a BSD 3-clause license.
Collapse
Affiliation(s)
- Dongze He
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, USA
| | - Yuan Gao
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, USA
| | - Spencer Skylar Chan
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | | | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
7
|
Gayoso A, Weiler P, Lotfollahi M, Klein D, Hong J, Streets A, Theis FJ, Yosef N. Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells. Nat Methods 2024; 21:50-59. [PMID: 37735568 PMCID: PMC10776389 DOI: 10.1038/s41592-023-01994-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 08/08/2023] [Indexed: 09/23/2023]
Abstract
RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot single-cell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a gene-specific dynamical model of RNA metabolism and provides a transcriptome-wide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI's posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use time-dependent transcription rates.
Collapse
Affiliation(s)
- Adam Gayoso
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Philipp Weiler
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Mohammad Lotfollahi
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Wellcome Sanger Institute, Cambridge, UK
| | - Dominik Klein
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Justin Hong
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Aaron Streets
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
8
|
Ford K, Zuin E, Righelli D, Medina E, Schoch H, Singletary K, Muheim C, Frank MG, Hicks SC, Risso D, Peixoto L. A Global Transcriptional Atlas of the Effect of Sleep Deprivation in the Mouse Frontal Cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.28.569011. [PMID: 38076891 PMCID: PMC10705260 DOI: 10.1101/2023.11.28.569011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Sleep deprivation (SD) has negative effects on brain function. Sleep problems are prevalent in neurodevelopmental, neurodegenerative and psychiatric disorders. Thus, understanding the molecular consequences of SD is of fundamental importance in neuroscience. In this study, we present the first simultaneous bulk and single-nuclear (sn)RNA sequencing characterization of the effects of SD in the mouse frontal cortex. We show that SD predominantly affects glutamatergic neurons, specifically in layers 4 and 5, and produces isoform switching of thousands of transcripts. At both the global and cell-type specific level, SD has a large repressive effect on transcription, down-regulating thousands of genes and transcripts; underscoring the importance of accounting for the effects of sleep loss in transcriptome studies of brain function. As a resource we provide extensive characterizations of cell types, genes, transcripts and pathways affected by SD; as well as tutorials for data analysis.
Collapse
Affiliation(s)
- Kaitlyn Ford
- Department of Translational Medicine and Physiology, Sleep and Performance Research Center. Elson S. Floyd College of Medicine. Washington State University, Spokane, WA
| | - Elena Zuin
- Department of Biology, University of Padova, Italy
- Department of Statistical Sciences, University of Padova, Italy
| | - Dario Righelli
- Department of Statistical Sciences, University of Padova, Italy
| | - Elizabeth Medina
- Department of Translational Medicine and Physiology, Sleep and Performance Research Center. Elson S. Floyd College of Medicine. Washington State University, Spokane, WA
| | - Hannah Schoch
- Department of Translational Medicine and Physiology, Sleep and Performance Research Center. Elson S. Floyd College of Medicine. Washington State University, Spokane, WA
| | - Kristan Singletary
- Department of Translational Medicine and Physiology, Sleep and Performance Research Center. Elson S. Floyd College of Medicine. Washington State University, Spokane, WA
| | - Christine Muheim
- Department of Translational Medicine and Physiology, Sleep and Performance Research Center. Elson S. Floyd College of Medicine. Washington State University, Spokane, WA
| | - Marcos G Frank
- Department of Translational Medicine and Physiology, Sleep and Performance Research Center. Elson S. Floyd College of Medicine. Washington State University, Spokane, WA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Malone Center for Engineering in Healthcare, Johns Hopkins University, MD, USA
| | - Davide Risso
- Department of Statistical Sciences, University of Padova, Italy
| | - Lucia Peixoto
- Department of Translational Medicine and Physiology, Sleep and Performance Research Center. Elson S. Floyd College of Medicine. Washington State University, Spokane, WA
| |
Collapse
|
9
|
Zheng SC, Stein-O’Brien G, Boukas L, Goff LA, Hansen KD. Pumping the brakes on RNA velocity by understanding and interpreting RNA velocity estimates. Genome Biol 2023; 24:246. [PMID: 37885016 PMCID: PMC10601342 DOI: 10.1186/s13059-023-03065-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 09/19/2023] [Indexed: 10/28/2023] Open
Abstract
BACKGROUND RNA velocity analysis of single cells offers the potential to predict temporal dynamics from gene expression. In many systems, RNA velocity has been observed to produce a vector field that qualitatively reflects known features of the system. However, the limitations of RNA velocity estimates are still not well understood. RESULTS We analyze the impact of different steps in the RNA velocity workflow on direction and speed. We consider both high-dimensional velocity estimates and low-dimensional velocity vector fields mapped onto an embedding. We conclude the transition probability method for mapping velocity estimates onto an embedding is effectively interpolating in the embedding space. Our findings reveal a significant dependence of the RNA velocity workflow on smoothing via the k-nearest-neighbors (k-NN) graph of the observed data. This reliance results in considerable estimation errors for both direction and speed in both high- and low-dimensional settings when the k-NN graph fails to accurately represent the true data structure; this is an unknown feature of real data. RNA velocity performs poorly at estimating speed in both low- and high-dimensional spaces, except in very low noise settings. We introduce a novel quality measure that can identify when RNA velocity should not be used. CONCLUSIONS Our findings emphasize the importance of choices in the RNA velocity workflow and highlight critical limitations of data analysis. We advise against over-interpreting expression dynamics using RNA velocity, particularly in terms of speed. Finally, we emphasize that the use of RNA velocity in assessing the correctness of a low-dimensional embedding is circular.
Collapse
Affiliation(s)
- Shijie C. Zheng
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
| | - Genevieve Stein-O’Brien
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD USA
- Kavli Neurodiscovery Institute, Johns Hopkins University, Baltimore, MD USA
- Quantitative Sciences Division, Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD USA
| | - Leandros Boukas
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
| | - Loyal A. Goff
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD USA
- Kavli Neurodiscovery Institute, Johns Hopkins University, Baltimore, MD USA
| | - Kasper D. Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD USA
| |
Collapse
|
10
|
Singh NP, Love MI, Patro R. TreeTerminus -creating transcript trees using inferential replicate counts. iScience 2023; 26:106961. [PMID: 37378336 PMCID: PMC10291472 DOI: 10.1016/j.isci.2023.106961] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 04/18/2023] [Accepted: 05/22/2023] [Indexed: 06/29/2023] Open
Abstract
A certain degree of uncertainty is always associated with the transcript abundance estimates. The uncertainty may make many downstream analyses, such as differential testing, difficult for certain transcripts. Conversely, gene-level analysis, though less ambiguous, is often too coarse-grained. We introduce TreeTerminus, a data-driven approach for grouping transcripts into a tree structure where leaves represent individual transcripts and internal nodes represent an aggregation of a transcript set. TreeTerminus constructs trees such that, on average, the inferential uncertainty decreases as we ascend the tree topology. The tree provides the flexibility to analyze data at nodes that are at different levels of resolution in the tree and can be tuned depending on the analysis of interest. We evaluated TreeTerminus on two simulated and two experimental datasets and observed an improved performance compared to transcripts (leaves) and other methods under several different metrics.
Collapse
Affiliation(s)
- Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| |
Collapse
|
11
|
Kuo A, Hansen KD, Hicks SC. Quantification and statistical modeling of droplet-based single-nucleus RNA-sequencing data. Biostatistics 2023:kxad010. [PMID: 37257175 DOI: 10.1093/biostatistics/kxad010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 03/22/2023] [Accepted: 04/19/2023] [Indexed: 06/02/2023] Open
Abstract
In complex tissues containing cells that are difficult to dissociate, single-nucleus RNA-sequencing (snRNA-seq) has become the preferred experimental technology over single-cell RNA-sequencing (scRNA-seq) to measure gene expression. To accurately model these data in downstream analyses, previous work has shown that droplet-based scRNA-seq data are not zero-inflated, but whether droplet-based snRNA-seq data follow the same probability distributions has not been systematically evaluated. Using pseudonegative control data from nuclei in mouse cortex sequenced with the 10x Genomics Chromium system and mouse kidney sequenced with the DropSeq system, we found that droplet-based snRNA-seq data follow a negative binomial distribution, suggesting that parametric statistical models applied to scRNA-seq are transferable to snRNA-seq. Furthermore, we found that the quantification choices in adapting quantification mapping strategies from scRNA-seq to snRNA-seq can play a significant role in downstream analyses and biological interpretation. In particular, reference transcriptomes that do not include intronic regions result in significantly smaller library sizes and incongruous cell type classifications. We also confirmed the presence of a gene length bias in snRNA-seq data, which we show is present in both exonic and intronic reads, and investigate potential causes for the bias.
Collapse
Affiliation(s)
- Albert Kuo
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
| | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, 733 N Broadway, Baltimore, MD 21205, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
| |
Collapse
|
12
|
Carilli M, Gorin G, Choi Y, Chari T, Pachter L. Biophysical modeling with variational autoencoders for bimodal, single-cell RNA sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.13.523995. [PMID: 36712140 PMCID: PMC9882246 DOI: 10.1101/2023.01.13.523995] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
We motivate and present biVI, which combines the variational autoencoder framework of scVI with biophysically motivated, bivariate models for nascent and mature RNA distributions. While previous approaches to integrate bimodal data via the variational autoencoder framework ignore the causal relationship between measurements, biVI models the biophysical processes that give rise to observations. We demonstrate through simulated benchmarking that biVI captures cell type structure in a low-dimensional space and accurately recapitulates parameter values and copy number distributions. On biological data, biVI provides a scalable route for identifying the biophysical mechanisms underlying gene expression. This analytical approach outlines a generalizable strategy for treating multimodal datasets generated by high-throughput, single-cell genomic assays.
Collapse
Affiliation(s)
- Maria Carilli
- Division of Biology and Biological Engineering, California Institute of Technology
| | - Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology
| | - Yongin Choi
- Biomedical Engineering Graduate Group, University of California, Davis
- Genome Center, University of California, Davis
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology
- Department of Computing and Mathematical Sciences, California Institute of Technology
| |
Collapse
|
13
|
Riemondy K, Henriksen JC, Rissland OS. Intron dynamics reveal principles of gene regulation during the maternal-to-zygotic transition. RNA (NEW YORK, N.Y.) 2023; 29:596-608. [PMID: 36764816 PMCID: PMC10158999 DOI: 10.1261/rna.079168.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 01/29/2023] [Indexed: 05/06/2023]
Abstract
The maternal-to-zygotic transition (MZT) is a conserved embryonic process in animals where developmental control shifts from the maternal to zygotic genome. A key step in this transition is zygotic transcription, and deciphering the MZT requires classifying newly transcribed genes. However, due to current technological limitations, this starting point remains a challenge for studying many species. Here, we present an alternative approach that characterizes transcriptome changes based solely on RNA-seq data. By combining intron-mapping reads and transcript-level quantification, we characterized transcriptome dynamics during the Drosophila melanogaster MZT. Our approach provides an accessible platform to investigate transcriptome dynamics that can be applied to the MZT in nonmodel organisms. In addition to classifying zygotically transcribed genes, our analysis revealed that over 300 genes express different maternal and zygotic transcript isoforms due to alternative splicing, polyadenylation, and promoter usage. The vast majority of these zygotic isoforms have the potential to be subject to different regulatory control, and over two-thirds encode different proteins. Thus, our analysis reveals an additional layer of regulation during the MZT, where new zygotic transcripts can generate additional proteome diversity.
Collapse
Affiliation(s)
- Kent Riemondy
- RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Jesslyn C Henriksen
- RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Olivia S Rissland
- RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| |
Collapse
|
14
|
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. BEERS2: RNA-Seq simulation through high fidelity in silico modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.21.537847. [PMID: 37162982 PMCID: PMC10168222 DOI: 10.1101/2023.04.21.537847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking, and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully-length mRNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM, or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in PCR amplification, barcode read errors, and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Dimitra Sarantopoulou
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Soumyashant Nayak
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, Karnataka, India
| | - Amruta Naik
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shaon Sengupta
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Peter S Choi
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
15
|
Păun O, Tan YX, Patel H, Strohbuecker S, Ghanate A, Cobolli-Gigli C, Llorian Sopena M, Gerontogianni L, Goldstone R, Ang SL, Guillemot F, Dias C. Pioneer factor ASCL1 cooperates with the mSWI/SNF complex at distal regulatory elements to regulate human neural differentiation. Genes Dev 2023; 37:218-242. [PMID: 36931659 PMCID: PMC10111863 DOI: 10.1101/gad.350269.122] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 02/28/2023] [Indexed: 03/19/2023]
Abstract
Pioneer transcription factors are thought to play pivotal roles in developmental processes by binding nucleosomal DNA to activate gene expression, though mechanisms through which pioneer transcription factors remodel chromatin remain unclear. Here, using single-cell transcriptomics, we show that endogenous expression of neurogenic transcription factor ASCL1, considered a classical pioneer factor, defines a transient population of progenitors in human neural differentiation. Testing ASCL1's pioneer function using a knockout model to define the unbound state, we found that endogenous expression of ASCL1 drives progenitor differentiation by cis-regulation both as a classical pioneer factor and as a nonpioneer remodeler, where ASCL1 binds permissive chromatin to induce chromatin conformation changes. ASCL1 interacts with BAF SWI/SNF chromatin remodeling complexes, primarily at targets where it acts as a nonpioneer factor, and we provide evidence for codependent DNA binding and remodeling at a subset of ASCL1 and SWI/SNF cotargets. Our findings provide new insights into ASCL1 function regulating activation of long-range regulatory elements in human neurogenesis and uncover a novel mechanism of its chromatin remodeling function codependent on partner ATPase activity.
Collapse
Affiliation(s)
- Oana Păun
- Neural Stem Cell Biology Laboratory, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Yu Xuan Tan
- Neural Stem Cell Biology Laboratory, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Harshil Patel
- Bioinformatics and Biostatistics Science and Technology Platform, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Stephanie Strohbuecker
- Bioinformatics and Biostatistics Science and Technology Platform, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Avinash Ghanate
- Bioinformatics and Biostatistics Science and Technology Platform, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Clementina Cobolli-Gigli
- Neural Stem Cell Biology Laboratory, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Miriam Llorian Sopena
- Bioinformatics and Biostatistics Science and Technology Platform, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Lina Gerontogianni
- Bioinformatics and Biostatistics Science and Technology Platform, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Robert Goldstone
- Bioinformatics and Biostatistics Science and Technology Platform, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Siew-Lan Ang
- Neural Stem Cell Biology Laboratory, the Francis Crick Institute, London NW1 1AT, United Kingdom
| | - François Guillemot
- Neural Stem Cell Biology Laboratory, the Francis Crick Institute, London NW1 1AT, United Kingdom;
| | - Cristina Dias
- Neural Stem Cell Biology Laboratory, the Francis Crick Institute, London NW1 1AT, United Kingdom;
- Medical and Molecular Genetics, School of Basic and Medical Biosciences, Faculty of Life Sciences and Medicine, King's College London, London SE1 9RT, United Kingdom
| |
Collapse
|
16
|
Lazure F, Farouni R, Sahinyan K, Blackburn DM, Hernández-Corchado A, Perron G, Lu T, Osakwe A, Ragoussis J, Crist C, Perkins TJ, Jahani-Asl A, Najafabadi HS, Soleimani VD. Transcriptional reprogramming of skeletal muscle stem cells by the niche environment. Nat Commun 2023; 14:535. [PMID: 36726011 PMCID: PMC9892560 DOI: 10.1038/s41467-023-36265-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 01/23/2023] [Indexed: 02/03/2023] Open
Abstract
Adult stem cells are indispensable for tissue regeneration, but their function declines with age. The niche environment in which the stem cells reside plays a critical role in their function. However, quantification of the niche effect on stem cell function is lacking. Using muscle stem cells (MuSC) as a model, we show that aging leads to a significant transcriptomic shift in their subpopulations accompanied by locus-specific gain and loss of chromatin accessibility and DNA methylation. By combining in vivo MuSC transplantation and computational methods, we show that the expression of approximately half of all age-altered genes in MuSCs from aged male mice can be restored by exposure to a young niche environment. While there is a correlation between gene reversibility and epigenetic alterations, restoration of gene expression occurs primarily at the level of transcription. The stem cell niche environment therefore represents an important therapeutic target to enhance tissue regeneration in aging.
Collapse
Affiliation(s)
- Felicia Lazure
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada.,Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada
| | - Rick Farouni
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada.,McGill Genome Centre, Victor Phillip Dahdaleh Institute of Genomic Medicine, 740 Dr Penfield Avenue, Montreal, QC, H3A 0G1, Canada
| | - Korin Sahinyan
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada.,Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada
| | - Darren M Blackburn
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada.,Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada
| | - Aldo Hernández-Corchado
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada.,McGill Genome Centre, Victor Phillip Dahdaleh Institute of Genomic Medicine, 740 Dr Penfield Avenue, Montreal, QC, H3A 0G1, Canada
| | - Gabrielle Perron
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada.,McGill Genome Centre, Victor Phillip Dahdaleh Institute of Genomic Medicine, 740 Dr Penfield Avenue, Montreal, QC, H3A 0G1, Canada
| | - Tianyuan Lu
- Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada.,Quantitative Life Sciences, McGill University, Montreal, Canada
| | - Adrien Osakwe
- Quantitative Life Sciences, McGill University, Montreal, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada.,McGill Genome Centre, Victor Phillip Dahdaleh Institute of Genomic Medicine, 740 Dr Penfield Avenue, Montreal, QC, H3A 0G1, Canada
| | - Colin Crist
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada.,Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada
| | - Theodore J Perkins
- Sprott Center for Stem Cell Research, Ottawa Hospital Research Institute, 501 Smyth Road, Ottawa, ON, K1H 8L6, Canada.,Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, ON, K1H 8M5, Canada
| | - Arezu Jahani-Asl
- Department of Cellular and Molecular Medicine and University of Ottawa Brain and Mind Research Institute, University of Ottawa, 451 Smyth Road, Ottawa, ON, K1H 8M5, Canada
| | - Hamed S Najafabadi
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada. .,McGill Genome Centre, Victor Phillip Dahdaleh Institute of Genomic Medicine, 740 Dr Penfield Avenue, Montreal, QC, H3A 0G1, Canada. .,Quantitative Life Sciences, McGill University, Montreal, Canada.
| | - Vahab D Soleimani
- Department of Human Genetics, McGill University, 3640 rue University, Montréal, QC, H3A 0C7, Canada. .,Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T 1E2, Canada.
| |
Collapse
|
17
|
He D, Soneson C, Patro R. Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.04.522742. [PMID: 36711921 PMCID: PMC9881993 DOI: 10.1101/2023.01.04.522742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Recently, a new modification has been proposed by Hjörleifsson and Sullivan et al. to the model used to classify the splicing status of reads (as spliced (mature), unspliced (nascent), or ambiguous) in single-cell and single-nucleus RNA-seq data. Here, we evaluate both the theoretical basis and practical implementation of the proposed method. The proposed method is highly-conservative, and therefore, unlikely to mischaracterize reads as spliced (mature) or unspliced (nascent) when they are not. However, we find that it leaves a large fraction of reads classified as ambiguous, and, in practice, allocates these ambiguous reads in an all-or-nothing manner, and differently between single-cell and single-nucleus RNA-seq data. Further, as implemented in practice, the ambiguous classification is implicit and based on the index against which the reads are mapped, which leads to several drawbacks compared to methods that consider both spliced (mature) and unspliced (nascent) mapping targets simultaneously - for example, the ability to use confidently assigned reads to rescue ambiguous reads based on shared UMIs and gene targets. Nonetheless, we show that these conservative assignment rules can be obtained directly in existing approaches simply by altering the set of targets that are indexed. To this end, we introduce the spliceu reference and show that its use with alevin-fry recapitulates the more conservative proposed classification. We also observe that, on experimental data, and under the proposed allocation rules for ambiguous UMIs, the difference between the proposed classification scheme and existing conventions appears much smaller than previously reported. We demonstrate the use of the new piscem index for mapping simultaneously against spliced (mature) and unspliced (nascent) targets, allowing classification against the full nascent and mature transcriptome in human or mouse in <3GB of memory. Finally, we discuss the potential of incorporating probabilistic evidence into the inference of splicing status, and suggest that it may provide benefits beyond what can be obtained from discrete classification of UMIs as splicing-ambiguous.
Collapse
Affiliation(s)
- Dongze He
- Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| |
Collapse
|
18
|
Dong X, Bacher R. Analysis of Single-Cell RNA-seq Data. Methods Mol Biol 2023; 2629:95-114. [PMID: 36929075 DOI: 10.1007/978-1-0716-2986-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
As single-cell RNA sequencing experiments continue to advance scientific discoveries across biological disciplines, an increasing number of analysis tools and workflows for analyzing the data have been developed. In this chapter, we describe a standard workflow and elaborate on relevant data analysis tools for analyzing single-cell RNA sequencing data. We provide recommendations for the appropriate use of commonly used methods, with code examples and analysis interpretations.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA
| | - Rhonda Bacher
- Department of Biostatistics, University of Florida, Gainesville, Florida, USA.
| |
Collapse
|
19
|
Virdi GS, Choi ML, Evans JR, Yao Z, Athauda D, Strohbuecker S, Nirujogi RS, Wernick AI, Pelegrina-Hidalgo N, Leighton C, Saleeb RS, Kopach O, Alrashidi H, Melandri D, Perez-Lloret J, Angelova PR, Sylantyev S, Eaton S, Heales S, Rusakov DA, Alessi DR, Kunath T, Horrocks MH, Abramov AY, Patani R, Gandhi S. Protein aggregation and calcium dysregulation are hallmarks of familial Parkinson's disease in midbrain dopaminergic neurons. NPJ Parkinsons Dis 2022; 8:162. [PMID: 36424392 PMCID: PMC9691718 DOI: 10.1038/s41531-022-00423-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 10/27/2022] [Indexed: 11/27/2022] Open
Abstract
Mutations in the SNCA gene cause autosomal dominant Parkinson's disease (PD), with loss of dopaminergic neurons in the substantia nigra, and aggregation of α-synuclein. The sequence of molecular events that proceed from an SNCA mutation during development, to end-stage pathology is unknown. Utilising human-induced pluripotent stem cells (hiPSCs), we resolved the temporal sequence of SNCA-induced pathophysiological events in order to discover early, and likely causative, events. Our small molecule-based protocol generates highly enriched midbrain dopaminergic (mDA) neurons: molecular identity was confirmed using single-cell RNA sequencing and proteomics, and functional identity was established through dopamine synthesis, and measures of electrophysiological activity. At the earliest stage of differentiation, prior to maturation to mDA neurons, we demonstrate the formation of small β-sheet-rich oligomeric aggregates, in SNCA-mutant cultures. Aggregation persists and progresses, ultimately resulting in the accumulation of phosphorylated α-synuclein aggregates. Impaired intracellular calcium signalling, increased basal calcium, and impairments in mitochondrial calcium handling occurred early at day 34-41 post differentiation. Once midbrain identity fully developed, at day 48-62 post differentiation, SNCA-mutant neurons exhibited mitochondrial dysfunction, oxidative stress, lysosomal swelling and increased autophagy. Ultimately these multiple cellular stresses lead to abnormal excitability, altered neuronal activity, and cell death. Our differentiation paradigm generates an efficient model for studying disease mechanisms in PD and highlights that protein misfolding to generate intraneuronal oligomers is one of the earliest critical events driving disease in human neurons, rather than a late-stage hallmark of the disease.
Collapse
Affiliation(s)
- Gurvir S Virdi
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - Minee L Choi
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - James R Evans
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - Zhi Yao
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | - Dilan Athauda
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | | | - Raja S Nirujogi
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Medical Research Council (MRC) Protein Phosphorylation and Ubiquitylation Unit, School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
| | - Anna I Wernick
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - Noelia Pelegrina-Hidalgo
- EaStCHEM School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, UK
- Center for Regenerative Medicine, University of Edinburgh, Edinburgh, EH16 4UU, UK
| | - Craig Leighton
- EaStCHEM School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, UK
- Center for Regenerative Medicine, University of Edinburgh, Edinburgh, EH16 4UU, UK
| | - Rebecca S Saleeb
- EaStCHEM School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, UK
| | - Olga Kopach
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London, WC1N 3BG, UK
| | - Haya Alrashidi
- UCL Great Ormond Street Institute of Child Health, London, WC1N 1EH, UK
| | - Daniela Melandri
- Department of Neurodegenerative Diseases, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | | | - Plamena R Angelova
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | - Sergiy Sylantyev
- Rowett Institute, University of Aberdeen, Ashgrove Rd West, Aberdeen, AB25 2ZD, UK
| | - Simon Eaton
- UCL Great Ormond Street Institute of Child Health, London, WC1N 1EH, UK
| | - Simon Heales
- UCL Great Ormond Street Institute of Child Health, London, WC1N 1EH, UK
| | - Dmitri A Rusakov
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London, WC1N 3BG, UK
| | - Dario R Alessi
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Medical Research Council (MRC) Protein Phosphorylation and Ubiquitylation Unit, School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
| | - Tilo Kunath
- Center for Regenerative Medicine, University of Edinburgh, Edinburgh, EH16 4UU, UK
| | - Mathew H Horrocks
- EaStCHEM School of Chemistry, University of Edinburgh, Edinburgh, EH9 3FJ, UK
| | - Andrey Y Abramov
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | - Rickie Patani
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
- Department of Neuromuscular Disease, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK.
| | - Sonia Gandhi
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, Queen Square, London, WC1N 3BG, UK.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA.
| |
Collapse
|
20
|
Gao M, Qiao C, Huang Y. UniTVelo: temporally unified RNA velocity reinforces single-cell trajectory inference. Nat Commun 2022; 13:6586. [PMID: 36329018 PMCID: PMC9633790 DOI: 10.1038/s41467-022-34188-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
Abstract
The recent breakthrough of single-cell RNA velocity methods brings attractive promises to reveal directed trajectory on cell differentiation, states transition and response to perturbations. However, the existing RNA velocity methods are often found to return erroneous results, partly due to model violation or lack of temporal regularization. Here, we present UniTVelo, a statistical framework of RNA velocity that models the dynamics of spliced and unspliced RNAs via flexible transcription activities. Uniquely, it also supports the inference of a unified latent time across the transcriptome. With ten datasets, we demonstrate that UniTVelo returns the expected trajectory in different biological systems, including hematopoietic differentiation and those even with weak kinetics or complex branches.
Collapse
Affiliation(s)
- Mingze Gao
- grid.194645.b0000000121742757School of Biomedical Sciences, University of Hong Kong, Hong Kong SAR, China
| | - Chen Qiao
- grid.194645.b0000000121742757School of Biomedical Sciences, University of Hong Kong, Hong Kong SAR, China
| | - Yuanhua Huang
- grid.194645.b0000000121742757School of Biomedical Sciences, University of Hong Kong, Hong Kong SAR, China ,grid.194645.b0000000121742757Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
21
|
Gorin G, Fang M, Chari T, Pachter L. RNA velocity unraveled. PLoS Comput Biol 2022; 18:e1010492. [PMID: 36094956 PMCID: PMC9499228 DOI: 10.1371/journal.pcbi.1010492] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 09/22/2022] [Accepted: 08/14/2022] [Indexed: 11/24/2022] Open
Abstract
We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems. Single-cell sequencing data are snapshots of biological processes, making it challenging to infer dynamic relationships between cell types. RNA velocity attempts to bypass this challenge by treating the unspliced RNA content as a proxy for spliced RNA content in the near future, and using this “extrapolation” to build directional relationships. However, the method, as implemented in several software packages, is not yet reliable enough to be actionable, in part due to the large number of arbitrary, user-set hyperparameters, as well as fundamental incompatibilities between the biophysics of transcription in the living cell and the models used throughout the velocity workflows. In this study, we review these issues, and use existing results from the fields of stochastic modeling and fluorescence transcriptomics to develop an alternative theoretical framework. We show that our framework can facilitate the development and inference of physically consistent models for sequencing data, as well as the unification of single-cell analyses to self-consistently treat variation due to cell type dynamics and identities, the stochasticity inherent to single-molecule processes, and the uncertainty introduced by sequencing experiments.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Meichen Fang
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America
- * E-mail:
| |
Collapse
|
22
|
Ranek JS, Stanley N, Purvis JE. Integrating temporal single-cell gene expression modalities for trajectory inference and disease prediction. Genome Biol 2022; 23:186. [PMID: 36064614 PMCID: PMC9442962 DOI: 10.1186/s13059-022-02749-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 08/16/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Current methods for analyzing single-cell datasets have relied primarily on static gene expression measurements to characterize the molecular state of individual cells. However, capturing temporal changes in cell state is crucial for the interpretation of dynamic phenotypes such as the cell cycle, development, or disease progression. RNA velocity infers the direction and speed of transcriptional changes in individual cells, yet it is unclear how these temporal gene expression modalities may be leveraged for predictive modeling of cellular dynamics. RESULTS Here, we present the first task-oriented benchmarking study that investigates integration of temporal sequencing modalities for dynamic cell state prediction. We benchmark ten integration approaches on ten datasets spanning different biological contexts, sequencing technologies, and species. We find that integrated data more accurately infers biological trajectories and achieves increased performance on classifying cells according to perturbation and disease states. Furthermore, we show that simple concatenation of spliced and unspliced molecules performs consistently well on classification tasks and can be used over more memory intensive and computationally expensive methods. CONCLUSIONS This work illustrates how integrated temporal gene expression modalities may be leveraged for predicting cellular trajectories and sample-associated perturbation and disease phenotypes. Additionally, this study provides users with practical recommendations for task-specific integration of single-cell gene expression modalities.
Collapse
Affiliation(s)
- Jolene S. Ranek
- grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA ,grid.10698.360000000122483208Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Natalie Stanley
- grid.10698.360000000122483208Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA ,grid.10698.360000000122483208Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Jeremy E. Purvis
- grid.10698.360000000122483208Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA ,grid.10698.360000000122483208Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, USA
| |
Collapse
|
23
|
Gorin G, Pachter L. Modeling bursty transcription and splicing with the chemical master equation. Biophys J 2022; 121:1056-1069. [PMID: 35143775 PMCID: PMC8943761 DOI: 10.1016/j.bpj.2022.02.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 11/29/2021] [Accepted: 02/03/2022] [Indexed: 11/16/2022] Open
Abstract
Splicing cascades that alter gene products posttranscriptionally also affect expression dynamics. We study a class of processes and associated distributions that emerge from models of bursty promoters coupled to directed acyclic graphs of splicing. These solutions provide full time-dependent joint distributions for an arbitrary number of species with general noise behaviors and transient phenomena, offering qualitative and quantitative insights about how splicing can regulate expression dynamics. Finally, we derive a set of quantitative constraints on the minimum complexity necessary to reproduce gene coexpression patterns using synchronized burst models. We validate these findings by analyzing long-read sequencing data, where we find evidence of expression patterns largely consistent with these constraints.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering & Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California.
| |
Collapse
|
24
|
Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data. Nat Methods 2022; 19:316-322. [PMID: 35277707 PMCID: PMC8933848 DOI: 10.1038/s41592-022-01408-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 01/27/2022] [Indexed: 01/19/2023]
Abstract
The rapid growth of high-throughput single-cell and single-nucleus RNA-sequencing (sc/snRNA-seq) technologies has produced a wealth of data over the past few years. The size, volume, and distinctive characteristics of these data necessitate the development of new computational methods to accurately and efficiently quantify sc/snRNA-seq data into count matrices that constitute the input to downstream analyses. We introduce the alevin-fry framework for quantifying sc/snRNA-seq data. In addition to being faster and more memory frugal than other accurate quantification approaches, alevin-fry ameliorates the memory scalability and false-positive expression issues that are exhibited by other lightweight tools. We demonstrate how alevin-fry can be effectively used to quantify sc/snRNA-seq data, and also how the spliced and unspliced molecule quantification required as input for RNA velocity analyses can be seamlessly extracted from the same preprocessed data used to generate regular gene expression count matrices.
Collapse
|
25
|
Winkler EA, Kim CN, Ross JM, Garcia JH, Gil E, Oh I, Chen LQ, Wu D, Catapano JS, Raygor K, Narsinh K, Kim H, Weinsheimer S, Cooke DL, Walcott BP, Lawton MT, Gupta N, Zlokovic BV, Chang EF, Abla AA, Lim DA, Nowakowski TJ. A single-cell atlas of the normal and malformed human brain vasculature. Science 2022; 375:eabi7377. [PMID: 35084939 PMCID: PMC8995178 DOI: 10.1126/science.abi7377] [Citation(s) in RCA: 105] [Impact Index Per Article: 52.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Cerebrovascular diseases are a leading cause of death and neurologic disability. Further understanding of disease mechanisms and therapeutic strategies requires a deeper knowledge of cerebrovascular cells in humans. We profiled transcriptomes of 181,388 cells to define a cell atlas of the adult human cerebrovasculature, including endothelial cell molecular signatures with arteriovenous segmentation and expanded perivascular cell diversity. By leveraging this reference, we investigated cellular and molecular perturbations in brain arteriovenous malformations, which are a leading cause of stroke in young people, and identified pathologic endothelial transformations with abnormal vascular patterning and the ontology of vascularly derived inflammation. We illustrate the interplay between vascular and immune cells that contributes to brain hemorrhage and catalog opportunities for targeting angiogenic and inflammatory programs in vascular malformations.
Collapse
Affiliation(s)
- Ethan A Winkler
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
- Department of Neurosurgery, Barrow Neurological Institute, Phoenix, AZ, USA
| | - Chang N Kim
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
- Department of Anatomy, University of California, San Francisco, CA, USA
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, CA, USA
| | - Jayden M Ross
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
- Department of Anatomy, University of California, San Francisco, CA, USA
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, CA, USA
| | - Joseph H Garcia
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
| | - Eugene Gil
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, CA, USA
| | - Irene Oh
- Rebus Biosystems, Santa Clara, CA, USA
| | | | - David Wu
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, CA, USA
| | - Joshua S Catapano
- Department of Neurosurgery, Barrow Neurological Institute, Phoenix, AZ, USA
| | - Kunal Raygor
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
| | - Kazim Narsinh
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA
| | - Helen Kim
- Center for Cerebrovascular Research, Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA, USA
| | - Shantel Weinsheimer
- Center for Cerebrovascular Research, Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA, USA
| | - Daniel L Cooke
- Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA
| | - Brian P Walcott
- Department of Neurosurgery, NorthShore University HealthSystem, Evanston, IL, USA
| | - Michael T Lawton
- Department of Neurosurgery, Barrow Neurological Institute, Phoenix, AZ, USA
| | - Nalin Gupta
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
| | - Berislav V Zlokovic
- Department of Physiology and Neuroscience, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
| | - Adib A Abla
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
| | - Daniel A Lim
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, USA
| | - Tomasz J Nowakowski
- Department of Neurological Surgery, University of California, San Francisco, CA, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
- Department of Anatomy, University of California, San Francisco, CA, USA
- Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| |
Collapse
|
26
|
Lange M, Bergen V, Klein M, Setty M, Reuter B, Bakhti M, Lickert H, Ansari M, Schniering J, Schiller HB, Pe'er D, Theis FJ. CellRank for directed single-cell fate mapping. Nat Methods 2022; 19:159-170. [PMID: 35027767 PMCID: PMC8828480 DOI: 10.1038/s41592-021-01346-6] [Citation(s) in RCA: 201] [Impact Index Per Article: 100.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 11/07/2021] [Indexed: 12/20/2022]
Abstract
Computational trajectory inference enables the reconstruction of cell state dynamics from single-cell RNA sequencing experiments. However, trajectory inference requires that the direction of a biological process is known, largely limiting its application to differentiating systems in normal development. Here, we present CellRank (https://cellrank.org) for single-cell fate mapping in diverse scenarios, including regeneration, reprogramming and disease, for which direction is unknown. Our approach combines the robustness of trajectory inference with directional information from RNA velocity, taking into account the gradual and stochastic nature of cellular fate decisions, as well as uncertainty in velocity vectors. On pancreas development data, CellRank automatically detects initial, intermediate and terminal populations, predicts fate potentials and visualizes continuous gene expression trends along individual lineages. Applied to lineage-traced cellular reprogramming data, predicted fate probabilities correctly recover reprogramming outcomes. CellRank also predicts a new dedifferentiation trajectory during postinjury lung regeneration, including previously unknown intermediate cell states, which we confirm experimentally. CellRank infers directed cell state transitions and cell fates incorporating RNA velocity information into a graph based Markov process.
Collapse
Affiliation(s)
- Marius Lange
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.,Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Volker Bergen
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.,Department of Mathematics, Technical University of Munich, Munich, Germany
| | - Michal Klein
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
| | - Manu Setty
- Program for Computational and Systems Biology, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.,Basic Sciences Division and Translational Data Science IRC, Fred Hutchinson Cancer Research Center, Seattle WA, USA
| | - Bernhard Reuter
- Department of Computer Science, University of Tübingen, Tübingen, Germany.,Zuse Institute Berlin (ZIB), Berlin, Germany
| | - Mostafa Bakhti
- Institute of Diabetes and Regeneration Research, Helmholtz Center Munich, Munich, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Heiko Lickert
- Institute of Diabetes and Regeneration Research, Helmholtz Center Munich, Munich, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Meshal Ansari
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany.,Comprehensive Pneumology Center (CPC) / Institute of Lung Biology and Disease (ILBD), Helmholtz Zentrum München, Member of the German Center for Lung Research (DZL), Munich, Germany
| | - Janine Schniering
- Comprehensive Pneumology Center (CPC) / Institute of Lung Biology and Disease (ILBD), Helmholtz Zentrum München, Member of the German Center for Lung Research (DZL), Munich, Germany
| | - Herbert B Schiller
- Comprehensive Pneumology Center (CPC) / Institute of Lung Biology and Disease (ILBD), Helmholtz Zentrum München, Member of the German Center for Lung Research (DZL), Munich, Germany
| | - Dana Pe'er
- Program for Computational and Systems Biology, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany. .,Department of Mathematics, Technical University of Munich, Munich, Germany. .,TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
| |
Collapse
|
27
|
Raabe FJ, Stephan M, Waldeck JB, Huber V, Demetriou D, Kannaiyan N, Galinski S, Glaser LV, Wehr MC, Ziller MJ, Schmitt A, Falkai P, Rossner MJ. Expression of Lineage Transcription Factors Identifies Differences in Transition States of Induced Human Oligodendrocyte Differentiation. Cells 2022; 11:cells11020241. [PMID: 35053357 PMCID: PMC8773672 DOI: 10.3390/cells11020241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 01/04/2022] [Accepted: 01/07/2022] [Indexed: 02/05/2023] Open
Abstract
Oligodendrocytes (OLs) are critical for myelination and are implicated in several brain disorders. Directed differentiation of human-induced OLs (iOLs) from pluripotent stem cells can be achieved by forced expression of different combinations of the transcription factors SOX10 (S), OLIG2 (O), and NKX6.2 (N). Here, we applied quantitative image analysis and single-cell transcriptomics to compare different transcription factor (TF) combinations for their efficacy towards robust OL lineage conversion. Compared with S alone, the combination of SON increases the number of iOLs and generates iOLs with a more complex morphology and higher expression levels of myelin-marker genes. RNA velocity analysis of individual cells reveals that S generates a population of oligodendrocyte-precursor cells (OPCs) that appear to be more immature than those generated by SON and to display distinct molecular properties. Our work highlights that TFs for generating iOPCs or iOLs should be chosen depending on the intended application or research question, and that SON might be beneficial to study more mature iOLs while S might be better suited to investigate iOPC biology.
Collapse
Affiliation(s)
- Florian J. Raabe
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), 80804 Munich, Germany
| | - Marius Stephan
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), 80804 Munich, Germany
- Systasy Bioscience GmbH, 81669 Munich, Germany
| | - Jan Benedikt Waldeck
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
| | - Verena Huber
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
| | - Damianos Demetriou
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
| | - Nirmal Kannaiyan
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
- Systasy Bioscience GmbH, 81669 Munich, Germany
| | - Sabrina Galinski
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
- Systasy Bioscience GmbH, 81669 Munich, Germany
| | - Laura V. Glaser
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;
| | - Michael C. Wehr
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
- Systasy Bioscience GmbH, 81669 Munich, Germany
| | - Michael J. Ziller
- Max Planck Institute of Psychiatry, 80804 Munich, Germany;
- Department of Psychiatry, University of Münster, 48149 Münster, Germany
| | - Andrea Schmitt
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
- Laboratory of Neurosciences (LIM-27), Institute of Psychiatry, University of São Paulo (USP), São Paulo 05403-903, Brazil
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
| | - Moritz J. Rossner
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, 80336 Munich, Germany; (F.J.R.); (M.S.); (J.B.W.); (V.H.); (D.D.); (N.K.); (S.G.); (M.C.W.); (A.S.); (P.F.)
- Systasy Bioscience GmbH, 81669 Munich, Germany
- Correspondence:
| |
Collapse
|
28
|
You Y, Tian L, Su S, Dong X, Jabbari JS, Hickey PF, Ritchie ME. Benchmarking UMI-based single-cell RNA-seq preprocessing workflows. Genome Biol 2021; 22:339. [PMID: 34906205 PMCID: PMC8672463 DOI: 10.1186/s13059-021-02552-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
Collapse
Affiliation(s)
- Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Luyi Tian
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Jafar S. Jabbari
- Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, The University of Melbourne at The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Peter F. Hickey
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- Single-Cell Open Research Endeavour (SCORE), The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia
| |
Collapse
|
29
|
Wang L, Zhang Q, Qin Q, Trasanidis N, Vinyard M, Chen H, Pinello L. Current progress and potential opportunities to infer single-cell developmental trajectory and cell fate. CURRENT OPINION IN SYSTEMS BIOLOGY 2021; 26:1-11. [PMID: 33997529 PMCID: PMC8117397 DOI: 10.1016/j.coisb.2021.03.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Rapid technological advances in transcriptomics and lineage tracing technologies provide new opportunities to understand organismal development at the single-cell level. Building on these advances, various computational methods have been proposed to infer developmental trajectories and to predict cell fate. These methods have unveiled previously uncharacterized transitional cell types and differentiation processes. Importantly, the ability to recover cell states and trajectories has been evolving hand-in-hand with new technologies and diverse experimental designs; more recent methods can capture complex trajectory topologies and infer short- and long-term cell fate dynamics. Here, we summarize and categorize the most recent and popular computational approaches for trajectory inference based on the information they leverage and describe future challenges and opportunities for the development of new methods for reconstructing differentiation trajectories and inferring cell fates.
Collapse
Affiliation(s)
- Lingfei Wang
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Qian Zhang
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Qian Qin
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Nikolaos Trasanidis
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Centre for Haematology, Department of Immunology and Inflammation, Imperial College London, UK
| | - Michael Vinyard
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Huidong Chen
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
30
|
Weng G, Kim J, Won KJ. VeTra: a tool for trajectory inference based on RNA velocity. Bioinformatics 2021; 37:3509-3513. [PMID: 33974009 PMCID: PMC8545348 DOI: 10.1093/bioinformatics/btab364] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 04/11/2021] [Accepted: 05/10/2021] [Indexed: 11/20/2022] Open
Abstract
Motivation Trajectory inference (TI) for single cell RNA sequencing (scRNAseq) data is a powerful approach to interpret dynamic cellular processes such as cell cycle and development. Still, however, accurate inference of trajectory is challenging. Recent development of RNA velocity provides an approach to visualize cell state transition without relying on prior knowledge. Results To perform TI and group cells based on RNA velocity we developed VeTra. By applying cosine similarity and merging weakly connected components, VeTra identifies cell groups from the direction of cell transition. Besides, VeTra suggests key regulators from the inferred trajectory. VeTra is a useful tool for TI and subsequent analysis. Availability and implementation The Vetra is available at https://github.com/wgzgithub/VeTra. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangzheng Weng
- Department of Biology, The bioinformatics Centre, University of Copenhagen, 2200 Copenhagen N, Denmark.,Biotech Research and Innovation Centre (BRIC), University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Junil Kim
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, 2200 Copenhagen N, Denmark.,Novo Nordisk Foundation Center for Stem Cell Biology, DanStem, Faculty of Health and Medical Sciences, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen N, Denmark.,Department of Bioinformatics, School of Systems Biomedical Science, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, 06978 Seoul, South Korea
| | - Kyoung Jae Won
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, 2200 Copenhagen N, Denmark.,Novo Nordisk Foundation Center for Stem Cell Biology, DanStem, Faculty of Health and Medical Sciences, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen N, Denmark
| |
Collapse
|