101
|
Wilks C, Zheng SC, Chen FY, Charles R, Solomon B, Ling JP, Imada EL, Zhang D, Joseph L, Leek JT, Jaffe AE, Nellore A, Collado-Torres L, Hansen KD, Langmead B. recount3: summaries and queries for large-scale RNA-seq expression and splicing. Genome Biol 2021; 22:323. [PMID: 34844637 PMCID: PMC8628444 DOI: 10.1186/s13059-021-02533-6] [Citation(s) in RCA: 137] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/29/2021] [Indexed: 12/12/2022] Open
Abstract
We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio .
Collapse
Affiliation(s)
- Christopher Wilks
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Shijie C Zheng
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | | | - Rone Charles
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Thomas M. Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jonathan P Ling
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - David Zhang
- Institute of Child Health, University College London (UCL), London, UK
| | | | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Andrew E Jaffe
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
- Lieber Institute for Brain Development, Baltimore, USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Abhinav Nellore
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- Department of Surgery, Oregon Health & Science University, Portland, OR, USA
| | | | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA.
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
102
|
Kim MC, Jin Z, Kolb R, Borcherding N, Chatzkel JA, Falzarano SM, Zhang W. Updates on Immunotherapy and Immune Landscape in Renal Clear Cell Carcinoma. Cancers (Basel) 2021; 13:5856. [PMID: 34831009 PMCID: PMC8616149 DOI: 10.3390/cancers13225856] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/15/2021] [Accepted: 11/19/2021] [Indexed: 12/24/2022] Open
Abstract
Several clinicopathological features of clear cell renal cell carcinomas (ccRCC) contribute to make an "atypical" cancer, including resistance to chemotherapy, sensitivity to anti-angiogenesis therapy and ICIs despite a low mutational burden, and CD8+ T cell infiltration being the predictor for poor prognosis-normally CD8+ T cell infiltration is a good prognostic factor in cancer patients. These "atypical" features have brought researchers to investigate the molecular and immunological mechanisms that lead to the increased T cell infiltrates despite relatively low molecular burdens, as well as to decipher the immune landscape that leads to better response to ICIs. In the present study, we summarize the past and ongoing pivotal clinical trials of immunotherapies for ccRCC, emphasizing the potential molecular and cellular mechanisms that lead to the success or failure of ICI therapy. Single-cell analysis of ccRCC has provided a more thorough and detailed understanding of the tumor immune microenvironment and has facilitated the discovery of molecular biomarkers from the tumor-infiltrating immune cells. We herein will focus on the discussion of some major immune cells, including T cells and tumor-associated macrophages (TAM) in ccRCC. We will further provide some perspectives of using molecular and cellular biomarkers derived from these immune cell types to potentially improve the response rate to ICIs in ccRCC patients.
Collapse
Affiliation(s)
- Myung-Chul Kim
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, FL 32610, USA; (M.-C.K.); (Z.J.); (R.K.); (S.M.F.)
- UF Health Cancer Center, University of Florida, Gainesville, FL 32610, USA
| | - Zeng Jin
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, FL 32610, USA; (M.-C.K.); (Z.J.); (R.K.); (S.M.F.)
- UF Health Cancer Center, University of Florida, Gainesville, FL 32610, USA
| | - Ryan Kolb
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, FL 32610, USA; (M.-C.K.); (Z.J.); (R.K.); (S.M.F.)
- UF Health Cancer Center, University of Florida, Gainesville, FL 32610, USA
| | - Nicholas Borcherding
- Department of Pathology and Immunology, Washington University, St. Louis, MO 63110, USA;
| | | | - Sara Moscovita Falzarano
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, FL 32610, USA; (M.-C.K.); (Z.J.); (R.K.); (S.M.F.)
- UF Health Cancer Center, University of Florida, Gainesville, FL 32610, USA
| | - Weizhou Zhang
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, FL 32610, USA; (M.-C.K.); (Z.J.); (R.K.); (S.M.F.)
- UF Health Cancer Center, University of Florida, Gainesville, FL 32610, USA
| |
Collapse
|
103
|
Swamy VS, Fufa TD, Hufnagel RB, McGaughey DM. Building the mega single-cell transcriptome ocular meta-atlas. Gigascience 2021; 10:giab061. [PMID: 34651173 PMCID: PMC8514335 DOI: 10.1093/gigascience/giab061] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 07/27/2021] [Accepted: 08/24/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The development of highly scalable single-cell transcriptome technology has resulted in the creation of thousands of datasets, >30 in the retina alone. Analyzing the transcriptomes between different projects is highly desirable because this would allow for better assessment of which biological effects are consistent across independent studies. However it is difficult to compare and contrast data across different projects because there are substantial batch effects from computational processing, single-cell technology utilized, and the natural biological variation. While many single-cell transcriptome-specific batch correction methods purport to remove the technical noise, it is difficult to ascertain which method functions best. RESULTS We developed a lightweight R package (scPOP, single-cell Pick Optimal Parameters) that brings in batch integration methods and uses a simple heuristic to balance batch merging and cell type/cluster purity. We use this package along with a Snakefile-based workflow system to demonstrate how to optimally merge 766,615 cells from 33 retina datsets and 3 species to create a massive ocular single-cell transcriptome meta-atlas. CONCLUSIONS This provides a model for how to efficiently create meta-atlases for tissues and cells of interest.
Collapse
Affiliation(s)
- Vinay S Swamy
- Bioinformatics Group, Ophthalmic Genetics & Visual Function Branch, National Eye Institute, National Institutes of Health, 20892, Bethesda, Maryland, USA
| | - Temesgen D Fufa
- Medical Genetics and Ophthalmic Genomics Unit, National Eye Institute, National Institutes of Health, 20892, Bethesda, Maryland, USA
| | - Robert B Hufnagel
- Medical Genetics and Ophthalmic Genomics Unit, National Eye Institute, National Institutes of Health, 20892, Bethesda, Maryland, USA
| | - David M McGaughey
- Bioinformatics Group, Ophthalmic Genetics & Visual Function Branch, National Eye Institute, National Institutes of Health, 20892, Bethesda, Maryland, USA
| |
Collapse
|
104
|
Weber LM, Hippen AA, Hickey PF, Berrett KC, Gertz J, Doherty JA, Greene CS, Hicks SC. Genetic demultiplexing of pooled single-cell RNA-sequencing samples in cancer facilitates effective experimental design. Gigascience 2021; 10:giab062. [PMID: 34553212 PMCID: PMC8458035 DOI: 10.1093/gigascience/giab062] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 07/19/2021] [Accepted: 08/26/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Pooling cells from multiple biological samples prior to library preparation within the same single-cell RNA sequencing experiment provides several advantages, including lower library preparation costs and reduced unwanted technological variation, such as batch effects. Computational demultiplexing tools based on natural genetic variation between individuals provide a simple approach to demultiplex samples, which does not require complex additional experimental procedures. However, to our knowledge these tools have not been evaluated in cancer, where somatic variants, which could differ between cells from the same sample, may obscure the signal in natural genetic variation. RESULTS Here, we performed in silico benchmark evaluations by combining raw sequencing reads from multiple single-cell samples in high-grade serous ovarian cancer, which has a high copy number burden, and lung adenocarcinoma, which has a high tumor mutational burden. Our results confirm that genetic demultiplexing tools can be effectively deployed on cancer tissue using a pooled experimental design, although high proportions of ambient RNA from cell debris reduce performance. CONCLUSIONS This strategy provides significant cost savings through pooled library preparation. To facilitate similar analyses at the experimental design phase, we provide freely accessible code and a reproducible Snakemake workflow built around the best-performing tools found in our in silico benchmark evaluations, available at https://github.com/lmweber/snp-dmx-cancer.
Collapse
Affiliation(s)
- Lukas M Weber
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Ariel A Hippen
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Peter F Hickey
- Advanced Technology & Biology Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
| | - Kristofer C Berrett
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA
| | - Jason Gertz
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA
| | - Jennifer Anne Doherty
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA
| | - Casey S Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| |
Collapse
|
105
|
Di Persio S, Tekath T, Siebert-Kuss LM, Cremers JF, Wistuba J, Li X, Meyer Zu Hörste G, Drexler HCA, Wyrwoll MJ, Tüttelmann F, Dugas M, Kliesch S, Schlatt S, Laurentino S, Neuhaus N. Single-cell RNA-seq unravels alterations of the human spermatogonial stem cell compartment in patients with impaired spermatogenesis. CELL REPORTS MEDICINE 2021; 2:100395. [PMID: 34622232 PMCID: PMC8484693 DOI: 10.1016/j.xcrm.2021.100395] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/01/2021] [Accepted: 08/17/2021] [Indexed: 02/06/2023]
Abstract
Despite the high incidence of male infertility, only 30% of infertile men receive a causative diagnosis. To explore the regulatory mechanisms governing human germ cell function in normal and impaired spermatogenesis (crypto), we performed single-cell RNA sequencing (>30,000 cells). We find major alterations in the crypto spermatogonial compartment with increased numbers of the most undifferentiated spermatogonia (PIWIL4+). We also observe a transcriptional switch within the spermatogonial compartment driven by increased and prolonged expression of the transcription factor EGR4. Intriguingly, the EGR4-regulated chromatin-associated transcriptional repressor UTF1 is downregulated at transcriptional and protein levels. This is associated with changes in spermatogonial chromatin structure and fewer Adark spermatogonia, characterized by tightly compacted chromatin and serving as reserve stem cells. These findings suggest that crypto patients are disadvantaged, as fewer cells safeguard their germline’s genetic integrity. These identified spermatogonial regulators will be highly interesting targets to uncover genetic causes of male infertility. Crypto(zoospermic) men show increased number of PIWIL4+/EGR4+ spermatogonia Crypto undifferentiated spermatogonia over-activate the EGR4 regulatory network The predicted EGR4 target UTF1 is downregulated in crypto spermatogonia Crypto testes show reduced numbers of UTF1+ Adark reserve spermatogonia
Collapse
Affiliation(s)
- Sara Di Persio
- Centre of Reproductive Medicine and Andrology, University Hospital of Münster, 48149 Münster, Germany
| | - Tobias Tekath
- Institute of Medical Informatics, University Hospital of Münster, 48149 Münster, Germany
| | - Lara Marie Siebert-Kuss
- Centre of Reproductive Medicine and Andrology, University Hospital of Münster, 48149 Münster, Germany
| | - Jann-Frederik Cremers
- Centre of Reproductive Medicine and Andrology, Department of Clinical and Surgical Andrology, University Hospital of Münster, 48149 Münster, Germany
| | - Joachim Wistuba
- Centre of Reproductive Medicine and Andrology, University Hospital of Münster, 48149 Münster, Germany
| | - Xiaolin Li
- Department of Neurology with Institute of Translational Neurology, University Hospital of Münster, 48149 Münster, Germany
| | - Gerd Meyer Zu Hörste
- Department of Neurology with Institute of Translational Neurology, University Hospital of Münster, 48149 Münster, Germany
| | - Hannes C A Drexler
- Bioanalytical Mass Spectrometry Unit, Max Planck Institute for Molecular Biomedicine, 48149 Münster, Germany
| | - Margot Julia Wyrwoll
- Centre of Reproductive Medicine and Andrology, Department of Clinical and Surgical Andrology, University Hospital of Münster, 48149 Münster, Germany.,Institute of Reproductive Genetics, University of Münster, 48149 Münster, Germany
| | - Frank Tüttelmann
- Institute of Reproductive Genetics, University of Münster, 48149 Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, University Hospital of Münster, 48149 Münster, Germany
| | - Sabine Kliesch
- Centre of Reproductive Medicine and Andrology, Department of Clinical and Surgical Andrology, University Hospital of Münster, 48149 Münster, Germany
| | - Stefan Schlatt
- Centre of Reproductive Medicine and Andrology, University Hospital of Münster, 48149 Münster, Germany
| | - Sandra Laurentino
- Centre of Reproductive Medicine and Andrology, University Hospital of Münster, 48149 Münster, Germany
| | - Nina Neuhaus
- Centre of Reproductive Medicine and Andrology, University Hospital of Münster, 48149 Münster, Germany
| |
Collapse
|
106
|
Shiga M, Seno S, Onizuka M, Matsuda H. SC-JNMF: single-cell clustering integrating multiple quantification methods based on joint non-negative matrix factorization. PeerJ 2021; 9:e12087. [PMID: 34532161 PMCID: PMC8404576 DOI: 10.7717/peerj.12087] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 08/07/2021] [Indexed: 11/20/2022] Open
Abstract
Single-cell RNA-sequencing is a rapidly evolving technology that enables us to understand biological processes at unprecedented resolution. Single-cell expression analysis requires a complex data processing pipeline, and the pipeline is divided into two main parts: The quantification part, which converts the sequence information into gene-cell matrix data; the analysis part, which analyzes the matrix data using statistics and/or machine learning techniques. In the analysis part, unsupervised cell clustering plays an important role in identifying cell types and discovering cell diversity and subpopulations. Identified cell clusters are also used for subsequent analysis, such as finding differentially expressed genes and inferring cell trajectories. However, single-cell clustering using gene expression profiles shows different results depending on the quantification methods. Clustering results are greatly affected by the quantification method used in the upstream process. In other words, even if the original RNA-sequence data is the same, gene expression profiles processed by different quantification methods will produce different clusters. In this article, we propose a robust and highly accurate clustering method based on joint non-negative matrix factorization (joint-NMF) by utilizing the information from multiple gene expression profiles quantified using different methods from the same RNA-sequence data. Our joint-NMF can extract common factors among multiple gene expression profiles by applying each NMF under the constraint that one of the factorized matrices is shared among multiple NMFs. The joint-NMF determines more robust and accurate cell clustering results by leveraging multiple quantification methods compared to conventional clustering methods, which use only a single gene expression profile. Additionally, we showed the usefulness of discovering marker genes with the extracted features using our method.
Collapse
Affiliation(s)
- Mikio Shiga
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Shigeto Seno
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Makoto Onizuka
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Hideo Matsuda
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| |
Collapse
|
107
|
Shainer I, Stemmer M. Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets. BMC Genomics 2021; 22:661. [PMID: 34521337 PMCID: PMC8439043 DOI: 10.1186/s12864-021-07930-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 08/11/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline. RESULTS In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. CONCLUSION While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.
Collapse
Affiliation(s)
- Inbal Shainer
- Max Planck Institute of Neurobiology, Am Klopferspitz 18, 82152, Martinsried, Germany
| | - Manuel Stemmer
- Max Planck Institute of Neurobiology, Am Klopferspitz 18, 82152, Martinsried, Germany.
| |
Collapse
|
108
|
Tekath T, Dugas M. Differential transcript usage analysis of bulk and single-cell RNA-seq data with DTUrtle. Bioinformatics 2021; 37:3781-3787. [PMID: 34469510 PMCID: PMC8570804 DOI: 10.1093/bioinformatics/btab629] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 08/17/2021] [Accepted: 08/30/2021] [Indexed: 11/22/2022] Open
Abstract
Motivation Each year, the number of published bulk and single-cell RNA-seq datasets is growing exponentially. Studies analyzing such data are commonly looking at gene-level differences, while the collected RNA-seq data inherently represents reads of transcript isoform sequences. Utilizing transcriptomic quantifiers, RNA-seq reads can be attributed to specific isoforms, allowing for analysis of transcript-level differences. A differential transcript usage (DTU) analysis is testing for proportional differences in a gene’s transcript composition, and has been of rising interest for many research questions, such as analysis of differential splicing or cell-type identification. Results We present the R package DTUrtle, the first DTU analysis workflow for both bulk and single-cell RNA-seq datasets, and the first package to conduct a ‘classical’ DTU analysis in a single-cell context. DTUrtle extends established statistical frameworks, offers various result aggregation and visualization options and a novel detection probability score for tagged-end data. It has been successfully applied to bulk and single-cell RNA-seq data of human and mouse, confirming and extending key results. In addition, we present novel potential DTU applications like the identification of cell-type specific transcript isoforms as biomarkers. Availability and implementation The R package DTUrtle is available at https://github.com/TobiTekath/DTUrtle with extensive vignettes and documentation at https://tobitekath.github.io/DTUrtle/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tobias Tekath
- Institute of Medical Informatics, University Hospital of Münster, Münster, 48149, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, 69120, Germany
| |
Collapse
|
109
|
Cagnin S, Alessio E, Bonadio RS, Sales G. Single-Cell RNAseq Analysis of lncRNAs. Methods Mol Biol 2021; 2348:71-90. [PMID: 34160800 DOI: 10.1007/978-1-0716-1581-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Mammalian genomes are pervasively transcribed and a small fraction of RNAs produced codify for proteins. The importance of noncoding RNAs for the maintenance of cell functions is well known (e.g., rRNAs, tRNAs), but only recently it was first demonstrated the involvement of microRNAs (miRNAs) in posttranscriptional regulation and then the activity of long noncoding RNAs (lncRNAs) in the regulation of miRNAs, DNA structure and protein function. LncRNAs have an expression more cell specific than other RNAs and basing on their subcellular localization exert different functions. In this book chapter we consider different protocols to evaluate the expression of lncRNAs at the single cell level using genome-wide approaches. We considered the skeletal muscle as example because the most abundant tissue in mammals involved in the regulation of metabolism and body movement. We firstly described how to isolate the smallest complete contractile system responsible for muscle metabolic and contractile traits (myofibers). We considered how to separate long and short RNAs to allow the sequencing of the full-length transcript using the SMART technique for the retrotranscription. Because of myofibers are multinucleated cells and because of it is better to perform single cell sequencing on fresh tissues we described the single-nucleus sequencing that can be applied to frozen tissues. The chapter concludes with a description of bioinformatics approaches to evaluate differential expression from single-cell or single-nucleus RNA sequencing.
Collapse
Affiliation(s)
- Stefano Cagnin
- Department of Biology, University of Padova, Padova, Italy.
- CRIBI Biotechnology Center, University of Padova, Padova, Italy.
- CIR-Myo Myology Center, University of Padova, Padova, Italy.
| | - Enrico Alessio
- Department of Biology, University of Padova, Padova, Italy
| | | | - Gabriele Sales
- Department of Biology, University of Padova, Padova, Italy
| |
Collapse
|
110
|
Risbridger GP, Clark AK, Porter LH, Toivanen R, Bakshi A, Lister NL, Pook D, Pezaro CJ, Sandhu S, Keerthikumar S, Quezada Urban R, Papargiris M, Kraska J, Madsen HB, Wang H, Richards MG, Niranjan B, O'Dea S, Teng L, Wheelahan W, Li Z, Choo N, Ouyang JF, Thorne H, Devereux L, Hicks RJ, Sengupta S, Harewood L, Iddawala M, Azad AA, Goad J, Grummet J, Kourambas J, Kwan EM, Moon D, Murphy DG, Pedersen J, Clouston D, Norden S, Ryan A, Furic L, Goode DL, Frydenberg M, Lawrence MG, Taylor RA. The MURAL collection of prostate cancer patient-derived xenografts enables discovery through preclinical models of uro-oncology. Nat Commun 2021; 12:5049. [PMID: 34413304 PMCID: PMC8376965 DOI: 10.1038/s41467-021-25175-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 07/26/2021] [Indexed: 02/06/2023] Open
Abstract
Preclinical testing is a crucial step in evaluating cancer therapeutics. We aimed to establish a significant resource of patient-derived xenografts (PDXs) of prostate cancer for rapid and systematic evaluation of candidate therapies. The PDX collection comprises 59 tumors collected from 30 patients between 2012-2020, coinciding with availability of abiraterone and enzalutamide. The PDXs represent the clinico-pathological and genomic spectrum of prostate cancer, from treatment-naïve primary tumors to castration-resistant metastases. Inter- and intra-tumor heterogeneity in adenocarcinoma and neuroendocrine phenotypes is evident from bulk and single-cell RNA sequencing data. Organoids can be cultured from PDXs, providing further capabilities for preclinical studies. Using a 1 x 1 x 1 design, we rapidly identify tumors with exceptional responses to combination treatments. To govern the distribution of PDXs, we formed the Melbourne Urological Research Alliance (MURAL). This PDX collection is a substantial resource, expanding the capacity to test and prioritize effective treatments for prospective clinical trials in prostate cancer.
Collapse
Affiliation(s)
- Gail P Risbridger
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia. .,Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| | - Ashlee K Clark
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - Laura H Porter
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - Roxanne Toivanen
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Andrew Bakshi
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia.,Computational Cancer Biology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Natalie L Lister
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - David Pook
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Department of Medicine, School of Clinical Sciences, Monash University, Clayton, VIC, Australia.,Department of Medical Oncology, Monash Health, Clayton, VIC, Australia
| | - Carmel J Pezaro
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Eastern Health and Monash University Eastern Health Clinical School, Box Hill, VIC, Australia.,Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, England
| | - Shahneen Sandhu
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia.,Department of Medical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Cancer Tissue Collection After Death (CASCADE) Program, Melbourne, VIC, Australia
| | - Shivakumar Keerthikumar
- Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia.,Computational Cancer Biology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Rosalia Quezada Urban
- Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia.,Computational Cancer Biology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Melissa Papargiris
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Australian Prostate Cancer Bioresource, VIC Node, Monash University, Clayton, VIC, Australia
| | - Jenna Kraska
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Australian Prostate Cancer Bioresource, VIC Node, Monash University, Clayton, VIC, Australia
| | - Heather B Madsen
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Australian Prostate Cancer Bioresource, VIC Node, Monash University, Clayton, VIC, Australia
| | - Hong Wang
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - Michelle G Richards
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - Birunthi Niranjan
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - Samantha O'Dea
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - Linda Teng
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - William Wheelahan
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - Zhuoer Li
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Physiology, Monash University, Clayton, VIC, Australia
| | - Nicholas Choo
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia
| | - John F Ouyang
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore Medical School, Singapore, Singapore
| | - Heather Thorne
- Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Lisa Devereux
- Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Rodney J Hicks
- Center for Molecular Imaging, Peter MacCallum Cancer Center, Melbourne, VIC, Australia
| | - Shomik Sengupta
- Eastern Health and Monash University Eastern Health Clinical School, Box Hill, VIC, Australia.,Department of Urology, Austin Hospital, The University of Melbourne, Heidelberg, VIC, Australia.,Department of Surgery, Austin Health, The University of Melbourne, Heidelberg, VIC, Australia.,Epworth Healthcare, Melbourne, VIC, Australia.,Epworth Freemasons, Epworth Health, East Melbourne, VIC, Australia
| | - Laurence Harewood
- Epworth Healthcare, Melbourne, VIC, Australia.,Department of Surgery, The University of Melbourne, Parkville, VIC, Australia
| | - Mahesh Iddawala
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Department of Medicine, School of Clinical Sciences, Monash University, Clayton, VIC, Australia
| | - Arun A Azad
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia.,Department of Medical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Jeremy Goad
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia.,Epworth Healthcare, Melbourne, VIC, Australia.,Division of Cancer Surgery, Peter MacCallum Cancer Centre, The University of Melbourne, Melbourne, VIC, Australia
| | - Jeremy Grummet
- Epworth Healthcare, Melbourne, VIC, Australia.,Department of Surgery, Central Clinical School, Monash University, Clayton, VIC, Australia.,Australian Urology Associates, Melbourne, VIC, Australia
| | - John Kourambas
- Department of Medicine, Monash Health, Casey Hospital, Berwick, VIC, Australia
| | - Edmond M Kwan
- Department of Medicine, School of Clinical Sciences, Monash University, Clayton, VIC, Australia.,Department of Medical Oncology, Monash Health, Clayton, VIC, Australia
| | - Daniel Moon
- Epworth Healthcare, Melbourne, VIC, Australia.,Division of Cancer Surgery, Peter MacCallum Cancer Centre, The University of Melbourne, Melbourne, VIC, Australia.,Australian Urology Associates, Melbourne, VIC, Australia.,Central Clinical School, Monash University, Clayton, VIC, Australia.,The Epworth Prostate Centre, Epworth Hospital, Richmond, VIC, Australia
| | - Declan G Murphy
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia.,Epworth Healthcare, Melbourne, VIC, Australia.,Division of Cancer Surgery, Peter MacCallum Cancer Centre, The University of Melbourne, Melbourne, VIC, Australia
| | - John Pedersen
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,TissuPath, Mount Waverley, VIC, Australia
| | | | - Sam Norden
- TissuPath, Mount Waverley, VIC, Australia
| | | | - Luc Furic
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - David L Goode
- Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia.,Computational Cancer Biology Program, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Mark Frydenberg
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Epworth Healthcare, Melbourne, VIC, Australia.,Australian Urology Associates, Melbourne, VIC, Australia.,Department of Surgery, Monash University, Clayton, VIC, Australia.,Department of Urology, Cabrini Institute, Cabrini Health, Melbourne, VIC, Australia
| | - Mitchell G Lawrence
- Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, Australia.,Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia
| | - Renea A Taylor
- Cancer Research Division, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia. .,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC, Australia. .,Prostate Cancer Research Group, Monash Biomedicine Discovery Institute, Cancer Program, Department of Physiology, Monash University, Clayton, VIC, Australia.
| |
Collapse
|
111
|
Das A, Begum K, Akhtar S, Ahmed R, Kulkarni R, Banu S. Genome-wide detection and classification of terpene synthase genes in Aquilaria agallochum. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2021; 27:1711-1729. [PMID: 34539112 PMCID: PMC8405786 DOI: 10.1007/s12298-021-01040-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/28/2021] [Accepted: 07/23/2021] [Indexed: 06/05/2023]
Abstract
Agarwood, one of the precious woods in the globe, is produced by Aquilaria plant species during an upshot of wounding and infection. Produced as a defence response, the dark, fragrant resin gets secreted in the plant's duramen, which is impregnated with fragrant molecules with the due course. Agarwood has gained worldwide popularity due to its high aromatic oil, fragrance, and pharmaceutical value, which makes it highly solicited by numerous industries. Predominant chemical constituents of agarwood, sesquiterpenoids, and 2-(2-phenylethyl) chromones have been scrutinized to comprehend the scientific nature of the fragrant wood and develop novel products. However, the genes involved in the biosynthesis of these aromatic compounds are still not comprehensively studied in Aquilaria. In this study, publicly available genomic and transcriptomics data of Aquilaria agallochum were integrated to identify putative functional terpene synthase genes (TPSs). The in silico study enabled us to identify ninety-six TPSs, of which thirty-nine full-length genes were systematically classified into TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g subfamilies based on their gene structure, conserve motif, and phylogenetic comparison with TPSs from other plant species. Analysis of the cis-regulatory elements present upstream of AaTPSs revealed their association with hormone, stress and light responses. In silico expression studies detected their up-regulation in stress induced tissue. This study provides a basic understanding of terpene synthase gene repertoire in Aquilaria agallochum and unlatches opportunities for the biochemical characterization and biotechnological exploration of these genes. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s12298-021-01040-z.
Collapse
Affiliation(s)
- Ankur Das
- Department of Bioengineering and Technology, Gauhati University, Guwahati, Assam 781014 India
| | - Khaleda Begum
- Department of Bioengineering and Technology, Gauhati University, Guwahati, Assam 781014 India
| | - Suraiya Akhtar
- Department of Bioengineering and Technology, Gauhati University, Guwahati, Assam 781014 India
| | - Raja Ahmed
- Department of Bioengineering and Technology, Gauhati University, Guwahati, Assam 781014 India
| | - Ram Kulkarni
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Lavale, Pune, India
| | - Sofia Banu
- Department of Bioengineering and Technology, Gauhati University, Guwahati, Assam 781014 India
| |
Collapse
|
112
|
Hippen AA, Falco MM, Weber LM, Erkan EP, Zhang K, Doherty JA, Vähärautio A, Greene CS, Hicks SC. miQC: An adaptive probabilistic framework for quality control of single-cell RNA-sequencing data. PLoS Comput Biol 2021; 17:e1009290. [PMID: 34428202 PMCID: PMC8415599 DOI: 10.1371/journal.pcbi.1009290] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 09/03/2021] [Accepted: 07/20/2021] [Indexed: 12/23/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has made it possible to profile gene expression in tissues at high resolution. An important preprocessing step prior to performing downstream analyses is to identify and remove cells with poor or degraded sample quality using quality control (QC) metrics. Two widely used QC metrics to identify a 'low-quality' cell are (i) if the cell includes a high proportion of reads that map to mitochondrial DNA (mtDNA) encoded genes and (ii) if a small number of genes are detected. Current best practices use these QC metrics independently with either arbitrary, uniform thresholds (e.g. 5%) or biological context-dependent (e.g. species) thresholds, and fail to jointly model these metrics in a data-driven manner. Current practices are often overly stringent and especially untenable on certain types of tissues, such as archived tumor tissues, or tissues associated with mitochondrial function, such as kidney tissue [1]. We propose a data-driven QC metric (miQC) that jointly models both the proportion of reads mapping to mtDNA genes and the number of detected genes with mixture models in a probabilistic framework to predict the low-quality cells in a given dataset. We demonstrate how our QC metric easily adapts to different types of single-cell datasets to remove low-quality cells while preserving high-quality cells that can be used for downstream analyses. Our software package is available at https://bioconductor.org/packages/miQC.
Collapse
Affiliation(s)
- Ariel A. Hippen
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Matias M. Falco
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Lukas M. Weber
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Erdogan Pekcan Erkan
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Kaiyang Zhang
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jennifer Anne Doherty
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, Utah, United States of America
| | - Anna Vähärautio
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Casey S. Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| |
Collapse
|
113
|
Squair JW, Skinnider MA, Gautier M, Foster LJ, Courtine G. Prioritization of cell types responsive to biological perturbations in single-cell data with Augur. Nat Protoc 2021; 16:3836-3873. [PMID: 34172974 DOI: 10.1038/s41596-021-00561-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 04/14/2021] [Indexed: 02/06/2023]
Abstract
Advances in single-cell genomics now enable large-scale comparisons of cell states across two or more experimental conditions. Numerous statistical tools are available to identify individual genes, proteins or chromatin regions that differ between conditions, but many experiments require inferences at the level of cell types, as opposed to individual analytes. We developed Augur to prioritize the cell types within a complex tissue that are most responsive to an experimental perturbation. In this protocol, we outline the application of Augur to single-cell RNA-seq data, proceeding from a genes-by-cells count matrix to a list of cell types ranked on the basis of their separability following a perturbation. We provide detailed instructions to enable investigators with limited experience in computational biology to perform cell-type prioritization within their own datasets and visualize the results. Moreover, we demonstrate the application of Augur in several more specialized workflows, including the use of RNA velocity for acute perturbations, experimental designs with multiple conditions, differential prioritization between two comparisons, and single-cell transcriptome imaging data. For a dataset containing on the order of 20,000 genes and 20 cell types, this protocol typically takes 1-4 h to complete.
Collapse
Affiliation(s)
- Jordan W Squair
- Center for Neuroprosthetics and Brain Mind Institute, Faculty of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. .,NeuroRestore, Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland. .,International Collaboration on Repair Discoveries (ICORD), University of British Columbia, Vancouver, British Columbia, Canada.
| | - Michael A Skinnider
- Center for Neuroprosthetics and Brain Mind Institute, Faculty of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. .,NeuroRestore, Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland. .,Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Matthieu Gautier
- Center for Neuroprosthetics and Brain Mind Institute, Faculty of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Leonard J Foster
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.,Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia, Canada
| | - Grégoire Courtine
- Center for Neuroprosthetics and Brain Mind Institute, Faculty of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. .,NeuroRestore, Department of Clinical Neuroscience, Lausanne University Hospital (CHUV) and University of Lausanne (UNIL), Lausanne, Switzerland.
| |
Collapse
|
114
|
Galanti L, Shasha D, Gunsalus KC. Pheniqs 2.0: accurate, high-performance Bayesian decoding and confidence estimation for combinatorial barcode indexing. BMC Bioinformatics 2021; 22:359. [PMID: 34215187 PMCID: PMC8254269 DOI: 10.1186/s12859-021-04267-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 06/14/2021] [Indexed: 11/24/2022] Open
Abstract
Background Systems biology increasingly relies on deep sequencing with combinatorial index tags to associate biological sequences with their sample, cell, or molecule of origin. Accurate data interpretation depends on the ability to classify sequences based on correct decoding of these combinatorial barcodes. The probability of correct decoding is influenced by both sequence quality and the number and arrangement of barcodes. The rising complexity of experimental designs calls for a probability model that accounts for both sequencing errors and random noise, generalizes to multiple combinatorial tags, and can handle any barcoding scheme. The needs for reproducibility and community benchmark standards demand a peer-reviewed tool that preserves decoding quality scores and provides tunable control over classification confidence that balances precision and recall. Moreover, continuous improvements in sequencing throughput require a fast, parallelized and scalable implementation. Results and discussion We developed a flexible, robustly engineered software that performs probabilistic decoding and supports arbitrarily complex barcoding designs. Pheniqs computes the full posterior decoding error probability of observed barcodes by consulting basecalling quality scores and prior distributions, and reports sequences and confidence scores in Sequence Alignment/Map (SAM) fields. The product of posteriors for multiple independent barcodes provides an overall confidence score for each read. Pheniqs achieves greater accuracy than minimum edit distance or simple maximum likelihood estimation, and it scales linearly with core count to enable the classification of > 11 billion reads in 1 h 15 m using < 50 megabytes of memory. Pheniqs has been in production use for seven years in our genomics core facility. Conclusion We introduce a computationally efficient software that implements both probabilistic and minimum distance decoders and show that decoding barcodes using posterior probabilities is more accurate than available methods. Pheniqs allows fine-tuning of decoding sensitivity using intuitive confidence thresholds and is extensible with alternative decoders and new error models. Any arbitrary arrangement of barcodes is easily configured, enabling computation of combinatorial confidence scores for any barcoding strategy. An optimized multithreaded implementation assures that Pheniqs is faster and scales better with complex barcode sets than existing tools. Support for POSIX streams and multiple sequencing formats enables easy integration with automated analysis pipelines. Supplementary Information The online version supplementary material available at 10.1186/s12859-021-04267-5.
Collapse
Affiliation(s)
- Lior Galanti
- Department of Biology, Center for Genomics and System Biology, New York University, New York, USA.,NYU Abu Dhabi Center for Genomics and System Biology, New York University, Abu Dhabi, United Arab Emirates
| | - Dennis Shasha
- Department of Computer Science, Courant Institute, New York University, New York, USA
| | - Kristin C Gunsalus
- Department of Biology, Center for Genomics and System Biology, New York University, New York, USA. .,NYU Abu Dhabi Center for Genomics and System Biology, New York University, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
115
|
Melsted P, Booeshaghi AS, Liu L, Gao F, Lu L, Min KHJ, da Veiga Beltrame E, Hjörleifsson KE, Gehring J, Pachter L. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat Biotechnol 2021; 39:813-818. [PMID: 33795888 DOI: 10.1038/s41587-021-00870-2] [Citation(s) in RCA: 227] [Impact Index Per Article: 56.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 02/09/2021] [Indexed: 11/08/2022]
Abstract
We describe a workflow for preprocessing of single-cell RNA-sequencing data that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near optimal in speed with a constant memory requirement providing scalability for arbitrarily large datasets. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses.
Collapse
Affiliation(s)
- Páll Melsted
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavik, Iceland
| | - A Sina Booeshaghi
- Department of Mechanical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Lauren Liu
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
| | - Fan Gao
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Bioinformatics Resource Center, Beckman Institute, California Institute of Technology, Pasadena, CA, USA
| | - Lambda Lu
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Kyung Hoi Joseph Min
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Eduardo da Veiga Beltrame
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | | | - Jase Gehring
- Department of Genome Science, University of Washington, Seattle, WA, USA
| | - Lior Pachter
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA.
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
116
|
Sun S, Xu L, Zou Q, Wang G. BP4RNAseq: a babysitter package for retrospective and newly generated RNA-seq data analyses using both alignment-based and alignment-free quantification method. Bioinformatics 2021; 37:1319-1321. [PMID: 32976573 DOI: 10.1093/bioinformatics/btaa832] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 08/25/2020] [Accepted: 09/10/2020] [Indexed: 12/13/2022] Open
Abstract
SUMMARY Processing raw reads of RNA-sequencing (RNA-seq) data, no matter public or newly sequenced data, involves a lot of specialized tools and technical configurations that are often unfamiliar and time-consuming to learn for non-bioinformatics researchers. Here, we develop the R package BP4RNAseq, which integrates the state-of-art tools from both alignment-based and alignment-free quantification workflows. The BP4RNAseq package is a highly automated tool using an optimized pipeline to improve the sensitivity and accuracy of RNA-seq analyses. It can take only two non-technical parameters and output six formatted gene expression quantification at gene and transcript levels. The package applies to both retrospective and newly generated bulk RNA-seq data analyses and is also applicable for single-cell RNA-seq analyses. It, therefore, greatly facilitates the application of RNA-seq. AVAILABILITY AND IMPLEMENTATION The BP4RNAseq package for R and its documentation are freely available at https://github.com/sunshanwen/BP4RNAseq. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shanwen Sun
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, 518055 China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054 China.,State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, 150040, China
| | - Guohua Wang
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, 150040, China.,College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040 China
| |
Collapse
|
117
|
Zhang J, Lv C, Mo C, Liu M, Wan Y, Li J, Wang Y. Single-Cell RNA Sequencing Analysis of Chicken Anterior Pituitary: A Bird's-Eye View on Vertebrate Pituitary. Front Physiol 2021; 12:562817. [PMID: 34267669 PMCID: PMC8276247 DOI: 10.3389/fphys.2021.562817] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2020] [Accepted: 05/21/2021] [Indexed: 01/08/2023] Open
Abstract
It is well-established that anterior pituitary contains multiple endocrine cell populations, and each of them can secrete one/two hormone(s) to regulate vital physiological processes of vertebrates. However, the gene expression profiles of each pituitary cell population remains poorly characterized in most vertebrate groups. Here we analyzed the transcriptome of each cell population in adult chicken anterior pituitaries using single-cell RNA sequencing technology. The results showed that: (1) four out of five known endocrine cell clusters have been identified and designated as the lactotrophs, thyrotrophs, corticotrophs, and gonadotrophs, respectively. Somatotrophs were not analyzed in the current study. Each cell cluster can express at least one known endocrine hormone, and novel marker genes (e.g., CD24 and HSPB1 in lactotrophs, NPBWR2 and NDRG1 in corticotrophs; DIO2 and SOUL in thyrotrophs, C5H11ORF96 and HPGDS in gonadotrophs) are identified. Interestingly, gonadotrophs were shown to abundantly express five peptide hormones: FSH, LH, GRP, CART and RLN3; (2) four non-endocrine/secretory cell types, including endothelial cells (expressing IGFBP7 and CFD) and folliculo-stellate cells (FS-cells, expressing S100A6 and S100A10), were identified in chicken anterior pituitaries. Among them, FS-cells can express many growth factors, peptides (e.g., WNT5A, HBEGF, Activins, VEGFC, NPY, and BMP4), and progenitor/stem cell-associated genes (e.g., Notch signaling components, CDH1), implying that the FS-cell cluster may act as a paracrine/autocrine signaling center and enrich pituitary progenitor/stem cells; (3) sexually dimorphic expression of many genes were identified in most cell clusters, including gonadotrophs and lactotrophs. Taken together, our data provides a bird's-eye view on the diverse aspects of anterior pituitaries, including cell composition, heterogeneity, cell-to-cell communication, and gene expression profiles, which facilitates our comprehensive understanding of vertebrate pituitary biology.
Collapse
Affiliation(s)
- Jiannan Zhang
- Key Laboratory of Bio-Resources and Eco-Environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Can Lv
- Key Laboratory of Bio-Resources and Eco-Environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Chunheng Mo
- Key Laboratory of Bio-Resources and Eco-Environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
- Key Laboratory of Birth Defects and Related Diseases of Women and Children, Ministry of Education, West China Second University Hospital, Sichuan University, Chengdu, China
| | - Meng Liu
- Key Laboratory of Bio-Resources and Eco-Environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yiping Wan
- Key Laboratory of Bio-Resources and Eco-Environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Juan Li
- Key Laboratory of Bio-Resources and Eco-Environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| | - Yajun Wang
- Key Laboratory of Bio-Resources and Eco-Environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
| |
Collapse
|
118
|
Yasumizu Y, Hara A, Sakaguchi S, Ohkura N. VIRTUS: a pipeline for comprehensive virus analysis from conventional RNA-seq data. Bioinformatics 2021; 37:1465-1467. [PMID: 33017003 PMCID: PMC7745649 DOI: 10.1093/bioinformatics/btaa859] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 09/03/2020] [Accepted: 09/21/2020] [Indexed: 12/30/2022] Open
Abstract
Summary The possibility that RNA transcripts from clinical samples contain plenty of virus RNAs has not been pursued actively so far. We here developed a new tool for analyzing virus-transcribed mRNAs, not virus copy numbers, in the data of bulk and single-cell RNA-sequencing of human cells. Our pipeline, named VIRTUS (VIRal Transcript Usage Sensor), was able to detect 762 viruses including herpesviruses, retroviruses and even SARS-CoV-2 (COVID-19), and quantify their transcripts in the sequence data. This tool thus enabled simultaneously detecting infected cells, the composition of multiple viruses within the cell, and the endogenous host-gene expression profile of the cell. This bioinformatics method would be instrumental in addressing the possible effects of covertly infecting viruses on certain diseases and developing new treatments to target such viruses. Availability and implementation : VIRTUS is implemented using Common Workflow Language and Docker under a CC-NC license. VIRTUS is freely available at https://github.com/yyoshiaki/VIRTUS. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yoshiaki Yasumizu
- Department of Experimental Immunology, Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| | - Atsushi Hara
- Department of Immunology, Nara Medical University, Kashihara, Nara 634-8521, Japan
| | - Shimon Sakaguchi
- Department of Experimental Immunology, Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| | - Naganari Ohkura
- Department of Experimental Immunology, Immunology Frontier Research Center, Osaka University, Osaka 565-0871, Japan
| |
Collapse
|
119
|
Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, Hoffman P, Stoeckius M, Papalexi E, Mimitou EP, Jain J, Srivastava A, Stuart T, Fleming LM, Yeung B, Rogers AJ, McElrath JM, Blish CA, Gottardo R, Smibert P, Satija R. Integrated analysis of multimodal single-cell data. Cell 2021; 184:3573-3587.e29. [PMID: 34062119 PMCID: PMC8238499 DOI: 10.1016/j.cell.2021.04.048] [Citation(s) in RCA: 7763] [Impact Index Per Article: 1940.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 03/03/2021] [Accepted: 04/28/2021] [Indexed: 02/08/2023]
Abstract
The simultaneous measurement of multiple modalities represents an exciting frontier for single-cell genomics and necessitates computational methods that can define cellular states based on multimodal data. Here, we introduce "weighted-nearest neighbor" analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of 211,000 human peripheral blood mononuclear cells (PBMCs) with panels extending to 228 antibodies to construct a multimodal reference atlas of the circulating immune system. Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets and to interpret immune responses to vaccination and coronavirus disease 2019 (COVID-19). Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets and to look beyond the transcriptome toward a unified and multimodal definition of cellular identity.
Collapse
Affiliation(s)
- Yuhan Hao
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; New York Genome Center, New York, NY 10013, USA
| | - Stephanie Hao
- Technology Innovation Lab, New York Genome Center, New York, NY 10013, USA
| | - Erica Andersen-Nissen
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Cape Town HVTN Immunology Lab, Hutchinson Cancer Research Institute of South Africa, Cape Town 8001, South Africa
| | - William M Mauck
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Shiwei Zheng
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; New York Genome Center, New York, NY 10013, USA
| | - Andrew Butler
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; New York Genome Center, New York, NY 10013, USA
| | - Maddie J Lee
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Aaron J Wilk
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Charlotte Darby
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Michael Zager
- Center for Data Visualization, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Paul Hoffman
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Marlon Stoeckius
- Technology Innovation Lab, New York Genome Center, New York, NY 10013, USA
| | - Efthymia Papalexi
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; New York Genome Center, New York, NY 10013, USA
| | - Eleni P Mimitou
- Technology Innovation Lab, New York Genome Center, New York, NY 10013, USA
| | - Jaison Jain
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Avi Srivastava
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Tim Stuart
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Lamar M Fleming
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | | | - Angela J Rogers
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Juliana M McElrath
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Catherine A Blish
- Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA; Chan Zuckerberg Biohub, San Francisco, CA 94063, USA
| | - Raphael Gottardo
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Peter Smibert
- Technology Innovation Lab, New York Genome Center, New York, NY 10013, USA.
| | - Rahul Satija
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA; New York Genome Center, New York, NY 10013, USA.
| |
Collapse
|
120
|
Statistical Modeling of High Dimensional Counts. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2284:97-134. [PMID: 33835440 DOI: 10.1007/978-1-0716-1307-8_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Statistical modeling of count data from RNA sequencing (RNA-seq) experiments is important for proper interpretation of results. Here I will describe how count data can be modeled using count distributions, or alternatively analyzed using nonparametric methods. I will focus on basic routines for performing data input, scaling/normalization, visualization, and statistical testing to determine sets of features where the counts reflect differences in gene expression across samples. Finally, I discuss limitations and possible extensions to the models presented here.
Collapse
|
121
|
Tsagiopoulou M, Maniou MC, Pechlivanis N, Togkousidis A, Kotrová M, Hutzenlaub T, Kappas I, Chatzidimitriou A, Psomopoulos F. UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction. Front Genet 2021; 12:660366. [PMID: 34122513 PMCID: PMC8193862 DOI: 10.3389/fgene.2021.660366] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 04/08/2021] [Indexed: 11/17/2022] Open
Abstract
A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.
Collapse
Affiliation(s)
- Maria Tsagiopoulou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Maria Christina Maniou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Nikolaos Pechlivanis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
- Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Anastasis Togkousidis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Michaela Kotrová
- Unit for Hematological Diagnostics, Department of Internal Medicine II, University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Tobias Hutzenlaub
- Laboratory for MEMS Applications, IMTEK-Department of Microsystems Engineering, University of Freiburg, Freiburg, Germany
- Hahn-Schickard, Freiburg, Germany
| | - Ilias Kappas
- Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | | | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
122
|
Gilis J, Vitting-Seerup K, Van den Berge K, Clement L. satuRn: Scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications. F1000Res 2021; 10:374. [PMID: 36762203 PMCID: PMC9892655 DOI: 10.12688/f1000research.51749.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/26/2022] [Indexed: 11/20/2022] Open
Abstract
Alternative splicing produces multiple functional transcripts from a single gene. Dysregulation of splicing is known to be associated with disease and as a hallmark of cancer. Existing tools for differential transcript usage (DTU) analysis either lack in performance, cannot account for complex experimental designs or do not scale to massive single-cell transcriptome sequencing (scRNA-seq) datasets. We introduce satuRn, a fast and flexible quasi-binomial generalized linear modelling framework that is on par with the best performing DTU methods from the bulk RNA-seq realm, while providing good false discovery rate control, addressing complex experimental designs, and scaling to scRNA-seq applications.
Collapse
Affiliation(s)
- Jeroen Gilis
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Data Mining and Modeling for Biomedicine, VIB Flemish Institute for Biotechnology, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| | - Kristoffer Vitting-Seerup
- Department of Biology, Kobenhavns Universitet, Copenhagen, 2200, Denmark
- Biotech Research and Innovation Centre (BRIC), Kobenhavns Universitet, Copenhagen, 2200, Denmark
- Danish Cancer Society Research Center, Copenhagen, 2100, Denmark
- Department of Health Technology, Danish Technical University, Kongens Lyngby, 2800, Denmark
| | - Koen Van den Berge
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
- Department of Statistics, University of California, Berkeley, Berkeley, California, USA
| | - Lieven Clement
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| |
Collapse
|
123
|
Wilson GW, Derouet M, Darling GE, Yeung JC. scSNV: accurate dscRNA-seq SNV co-expression analysis using duplicate tag collapsing. Genome Biol 2021; 22:144. [PMID: 33962667 PMCID: PMC8103760 DOI: 10.1186/s13059-021-02364-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 04/23/2021] [Indexed: 12/21/2022] Open
Abstract
Identifying single nucleotide variants has become common practice for droplet-based single-cell RNA-seq experiments; however, presently, a pipeline does not exist to maximize variant calling accuracy. Furthermore, molecular duplicates generated in these experiments have not been utilized to optimally detect variant co-expression. Herein, we introduce scSNV designed from the ground up to "collapse" molecular duplicates and accurately identify variants and their co-expression. We demonstrate that scSNV is fast, with a reduced false-positive variant call rate, and enables the co-detection of genetic variants and A>G RNA edits across twenty-two samples.
Collapse
Affiliation(s)
- Gavin W Wilson
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada.
| | - Mathieu Derouet
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada
| | - Gail E Darling
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada.,Division of Thoracic Surgery, Department of Surgery, University of Toronto, Toronto, M5G 2C4, Canada
| | - Jonathan C Yeung
- Latner Thoracic Surgery Research Laboratories, University Health Network, 101 College St., 2-501, Toronto, ON, M5G 2C4, Canada. .,Division of Thoracic Surgery, Department of Surgery, University of Toronto, Toronto, M5G 2C4, Canada. .,Toronto General Hospital, 200 Elizabeth St, 9N-983, Toronto, ON, M5G 2C4, Canada.
| |
Collapse
|
124
|
Cell-level metadata are indispensable for documenting single-cell sequencing datasets. PLoS Biol 2021; 19:e3001077. [PMID: 33945522 PMCID: PMC8121533 DOI: 10.1371/journal.pbio.3001077] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 05/14/2021] [Indexed: 11/19/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) provides an unprecedented view of cellular diversity of biological systems. However, across the thousands of publications and datasets generated using this technology, we estimate that only a minority (<25%) of studies provide cell-level metadata information containing identified cell types and related findings of the published dataset. Metadata omission hinders reproduction, exploration, validation, and knowledge transfer and is a common problem across journals, data repositories, and publication dates. We encourage investigators, reviewers, journals, and data repositories to improve their standards and ensure proper documentation of these valuable datasets. Most Gene Expression Omnibus (GEO) depositions of single-cell mRNA sequencing data do not include cell-level metadata generated by typical analysis pipelines; this Essay maintains that this omission greatly hinders reproduction, exploration, validation, and knowledge transfer and is a common problem across journals, data repositories, and publication dates.
Collapse
|
125
|
Gillen AE, Goering R, Taliaferro JM. Quantifying alternative polyadenylation in RNAseq data with LABRAT. Methods Enzymol 2021; 655:245-263. [PMID: 34183124 DOI: 10.1016/bs.mie.2021.03.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Alternative polyadenylation (APA) generates transcript isoforms that differ in their 3' UTR content and may therefore be subject to different regulatory fates. Although the existence of APA has been known for decades, quantification of APA isoforms from high-throughput RNA sequencing data has been difficult. To facilitate the study of APA in large datasets, we developed an APA quantification technique called LABRAT (Lightweight Alignment-Based Reckoning of Alternative Three-prime ends). LABRAT leverages modern transcriptome quantification approaches to determine the relative abundances of APA isoforms. In this manuscript we describe how LABRAT produces its calculations, provide a step-by-step protocol for its use, and demonstrate its ability to quantify APA in single-cell RNAseq data.
Collapse
Affiliation(s)
- Austin E Gillen
- Division of Hematology, University of Colorado School of Medicine, Aurora, CO, United States
| | - Raeann Goering
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States; RNA Bioscience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - J Matthew Taliaferro
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States; RNA Bioscience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO, United States.
| |
Collapse
|
126
|
Zheng H, Rao AM, Dermadi D, Toh J, Murphy Jones L, Donato M, Liu Y, Su Y, Dai CL, Kornilov SA, Karagiannis M, Marantos T, Hasin-Brumshtein Y, He YD, Giamarellos-Bourboulis EJ, Heath JR, Khatri P. Multi-cohort analysis of host immune response identifies conserved protective and detrimental modules associated with severity across viruses. Immunity 2021; 54:753-768.e5. [PMID: 33765435 PMCID: PMC7988739 DOI: 10.1016/j.immuni.2021.03.002] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 12/03/2020] [Accepted: 03/01/2021] [Indexed: 02/08/2023]
Abstract
Viral infections induce a conserved host response distinct from bacterial infections. We hypothesized that the conserved response is associated with disease severity and is distinct between patients with different outcomes. To test this, we integrated 4,780 blood transcriptome profiles from patients aged 0 to 90 years infected with one of 16 viruses, including SARS-CoV-2, Ebola, chikungunya, and influenza, across 34 cohorts from 18 countries, and single-cell RNA sequencing profiles of 702,970 immune cells from 289 samples across three cohorts. Severe viral infection was associated with increased hematopoiesis, myelopoiesis, and myeloid-derived suppressor cells. We identified protective and detrimental gene modules that defined distinct trajectories associated with mild versus severe outcomes. The interferon response was decoupled from the protective host response in patients with severe outcomes. These findings were consistent, irrespective of age and virus, and provide insights to accelerate the development of diagnostics and host-directed therapies to improve global pandemic preparedness.
Collapse
Affiliation(s)
- Hong Zheng
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA
| | - Aditya M Rao
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Immunology program, Stanford University, CA 94305, USA
| | - Denis Dermadi
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA
| | - Jiaying Toh
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Immunology program, Stanford University, CA 94305, USA
| | - Lara Murphy Jones
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA; Division of Critical Care Medicine, Department of Pediatrics, School of Medicine, Stanford University, CA 94305, USA
| | - Michele Donato
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA
| | - Yiran Liu
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Cancer Biology program, Stanford University, CA 94305, USA
| | - Yapeng Su
- Institute for Systems Biology, Seattle, WA, USA
| | - Cheng L Dai
- Institute for Systems Biology, Seattle, WA, USA
| | | | - Minas Karagiannis
- 4(th) Department of Internal Medicine, National and Kapodistrian University of Athens, Medical School, 124 62 Athens, Greece
| | - Theodoros Marantos
- 4(th) Department of Internal Medicine, National and Kapodistrian University of Athens, Medical School, 124 62 Athens, Greece
| | | | | | | | - James R Heath
- Institute for Systems Biology, Seattle, WA, USA; Department of Bioengineering, University of Washington, Seattle, WA 98195
| | - Purvesh Khatri
- Institute for Immunity, Transplantation and Infection, School of Medicine, Stanford University, CA 94305, USA; Center for Biomedical Informatics Research, Department of Medicine, School of Medicine, Stanford University, CA 94305, USA.
| |
Collapse
|
127
|
Prokop JW, Bupp CP, Frisch A, Bilinovich SM, Campbell DB, Vogt D, Schultz CR, Uhl KL, VanSickle E, Rajasekaran S, Bachmann AS. Emerging Role of ODC1 in Neurodevelopmental Disorders and Brain Development. Genes (Basel) 2021; 12:genes12040470. [PMID: 33806076 PMCID: PMC8064465 DOI: 10.3390/genes12040470] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 03/15/2021] [Accepted: 03/22/2021] [Indexed: 01/18/2023] Open
Abstract
Ornithine decarboxylase 1 (ODC1 gene) has been linked through gain-of-function variants to a rare disease featuring developmental delay, alopecia, macrocephaly, and structural brain anomalies. ODC1 has been linked to additional diseases like cancer, with growing evidence for neurological contributions to schizophrenia, mood disorders, anxiety, epilepsy, learning, and suicidal behavior. The evidence of ODC1 connection to neural disorders highlights the need for a systematic analysis of ODC1 genotype-to-phenotype associations. An analysis of variants from ClinVar, Geno2MP, TOPMed, gnomAD, and COSMIC revealed an intellectual disability and seizure connected loss-of-function variant, ODC G84R (rs138359527, NC_000002.12:g.10444500C > T). The missense variant is found in ~1% of South Asian individuals and results in 2.5-fold decrease in enzyme function. Expression quantitative trait loci (eQTLs) reveal multiple functionally annotated, non-coding variants regulating ODC1 that associate with psychiatric/neurological phenotypes. Further dissection of RNA-Seq during fetal brain development and within cerebral organoids showed an association of ODC1 expression with cell proliferation of neural progenitor cells, suggesting gain-of-function variants with neural over-proliferation and loss-of-function variants with neural depletion. The linkage from the expression data of ODC1 in early neural progenitor proliferation to phenotypes of neurodevelopmental delay and to the connection of polyamine metabolites in brain function establish ODC1 as a bona fide neurodevelopmental disorder gene.
Collapse
Affiliation(s)
- Jeremy W. Prokop
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Department of Pharmacology and Toxicology, Michigan State University, East Lansing, MI 48824, USA
- Center for Research in Autism, Intellectual, and Other Neurodevelopmental Disabilities, Michigan State University, East Lansing, MI 48824, USA
- Correspondence: (J.W.P.); (A.S.B.)
| | - Caleb P. Bupp
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Spectrum Health Medical Genetics, Grand Rapids, MI 49503, USA;
| | - Austin Frisch
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
| | - Stephanie M. Bilinovich
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
| | - Daniel B. Campbell
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Center for Research in Autism, Intellectual, and Other Neurodevelopmental Disabilities, Michigan State University, East Lansing, MI 48824, USA
- Neuroscience Program, Michigan State University, East Lansing, MI 48824, USA
| | - Daniel Vogt
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Center for Research in Autism, Intellectual, and Other Neurodevelopmental Disabilities, Michigan State University, East Lansing, MI 48824, USA
- Neuroscience Program, Michigan State University, East Lansing, MI 48824, USA
| | - Chad R. Schultz
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
| | - Katie L. Uhl
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
| | | | - Surender Rajasekaran
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Pediatric Intensive Care Unit, Helen DeVos Children’s Hospital, Grand Rapids, MI 49503, USA
- Office of Research, Spectrum Health, Grand Rapids, MI 49503, USA
| | - André S. Bachmann
- Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA; (C.P.B.); (A.F.); (S.M.B.); (D.B.C.); (D.V.); (C.R.S.); (K.L.U.); (S.R.)
- Correspondence: (J.W.P.); (A.S.B.)
| |
Collapse
|
128
|
Bilinovich SM, Uhl KL, Lewis K, Soehnlen X, Williams M, Vogt D, Prokop JW, Campbell DB. Integrated RNA Sequencing Reveals Epigenetic Impacts of Diesel Particulate Matter Exposure in Human Cerebral Organoids. Dev Neurosci 2021; 42:195-207. [PMID: 33657557 DOI: 10.1159/000513536] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 12/02/2020] [Indexed: 12/25/2022] Open
Abstract
Autism spectrum disorder (ASD) manifests early in childhood. While genetic variants increase risk for ASD, a growing body of literature has established that in utero chemical exposures also contribute to ASD risk. These chemicals include air-based pollutants like diesel particulate matter (DPM). A combination of single-cell and direct transcriptomics of DPM-exposed human-induced pluripotent stem cell-derived cerebral organoids revealed toxicogenomic effects of DPM exposure during fetal brain development. Direct transcriptomics, sequencing RNA bases via Nanopore, revealed that cerebral organoids contain extensive RNA modifications, with DPM-altering cytosine methylation in oxidative mitochondrial transcripts expressed in outer radial glia cells. Single-cell transcriptomics further confirmed an oxidative phosphorylation change in cell groups such as outer radial glia upon DPM exposure. This approach highlights how DPM exposure perturbs normal mitochondrial function and cellular respiration during early brain development, which may contribute to developmental disorders like ASD by altering neurodevelopment.
Collapse
Affiliation(s)
- Stephanie M Bilinovich
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA
| | - Katie L Uhl
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA
| | - Kristy Lewis
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA
| | - Xavier Soehnlen
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA
| | - Michael Williams
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA.,Center for Research in Autism, Intellectual, and other Neurodevelopmental Disabilities, Michigan State University, East Lansing, Michigan, USA.,Neuroscience Program, Michigan State University, East Lansing, Michigan, USA
| | - Daniel Vogt
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA.,Center for Research in Autism, Intellectual, and other Neurodevelopmental Disabilities, Michigan State University, East Lansing, Michigan, USA.,Neuroscience Program, Michigan State University, East Lansing, Michigan, USA
| | - Jeremy W Prokop
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA.,Center for Research in Autism, Intellectual, and other Neurodevelopmental Disabilities, Michigan State University, East Lansing, Michigan, USA.,Department of Pharmacology and Toxicology, Michigan State University, East Lansing, Michigan, USA
| | - Daniel B Campbell
- Department of Pediatrics & Human Development, Michigan State University, Grand Rapids, Michigan, USA, .,Center for Research in Autism, Intellectual, and other Neurodevelopmental Disabilities, Michigan State University, East Lansing, Michigan, USA, .,Neuroscience Program, Michigan State University, East Lansing, Michigan, USA,
| |
Collapse
|
129
|
Cribbs AP, Filippakopoulos P, Philpott M, Wells G, Penn H, Oerum H, Valge-Archer V, Feldmann M, Oppermann U. Dissecting the Role of BET Bromodomain Proteins BRD2 and BRD4 in Human NK Cell Function. Front Immunol 2021; 12:626255. [PMID: 33717143 PMCID: PMC7953504 DOI: 10.3389/fimmu.2021.626255] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 01/13/2021] [Indexed: 12/19/2022] Open
Abstract
Natural killer (NK) cells are innate lymphocytes that play a pivotal role in the immune surveillance and elimination of transformed or virally infected cells. Using a chemo-genetic approach, we identify BET bromodomain containing proteins BRD2 and BRD4 as central regulators of NK cell functions, including direct cytokine secretion, NK cell contact-dependent inflammatory cytokine secretion from monocytes as well as NK cell cytolytic functions. We show that both BRD2 and BRD4 control inflammatory cytokine production in NK cells isolated from healthy volunteers and from rheumatoid arthritis patients. In contrast, knockdown of BRD4 but not of BRD2 impairs NK cell cytolytic responses, suggesting BRD4 as critical regulator of NK cell mediated tumor cell elimination. This is supported by pharmacological targeting where the first-generation pan-BET bromodomain inhibitor JQ1(+) displays anti-inflammatory effects and inhibit tumor cell eradication, while the novel bivalent BET bromodomain inhibitor AZD5153, which shows differential activity towards BET family members, does not. Given the important role of both cytokine-mediated inflammatory microenvironment and cytolytic NK cell activities in immune-oncology therapies, our findings present a compelling argument for further clinical investigation.
Collapse
Affiliation(s)
- Adam P Cribbs
- Botnar Research Center, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, United Kingdom
| | | | - Martin Philpott
- Botnar Research Center, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, United Kingdom
| | - Graham Wells
- Botnar Research Center, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, United Kingdom
| | - Henry Penn
- Arthritis Centre, Northwick Park Hospital, Harrow, United Kingdom
| | - Henrik Oerum
- Roche Innovation Center Copenhagen A/S, Hørsholm, Denmark
| | - Viia Valge-Archer
- Bioscience, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, United Kingdom
| | - Marc Feldmann
- Kennedy Institute of Rheumatology Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, Oxford, United Kingdom
| | - Udo Oppermann
- Botnar Research Center, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, United Kingdom.,Freiburg Institute of Advanced Studies, Freiburg, Germany.,Oxford Centre for Translational Myeloma Research, Oxford, United Kingdom
| |
Collapse
|
130
|
Mukherjee K, Xue L, Planutis A, Gnanapragasam MN, Chess A, Bieker JJ. EKLF/KLF1 expression defines a unique macrophage subset during mouse erythropoiesis. eLife 2021; 10:61070. [PMID: 33570494 PMCID: PMC7932694 DOI: 10.7554/elife.61070] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 02/10/2021] [Indexed: 12/17/2022] Open
Abstract
Erythroblastic islands are a specialized niche that contain a central macrophage surrounded by erythroid cells at various stages of maturation. However, identifying the precise genetic and transcriptional control mechanisms in the island macrophage remains difficult due to macrophage heterogeneity. Using unbiased global sequencing and directed genetic approaches focused on early mammalian development, we find that fetal liver macrophages exhibit a unique expression signature that differentiates them from erythroid and adult macrophage cells. The importance of erythroid Krüppel-like factor (EKLF)/KLF1 in this identity is shown by expression analyses in EKLF-/- and in EKLF-marked macrophage cells. Single-cell sequence analysis simplifies heterogeneity and identifies clusters of genes important for EKLF-dependent macrophage function and novel cell surface biomarkers. Remarkably, this singular set of macrophage island cells appears transiently during embryogenesis. Together, these studies provide a detailed perspective on the importance of EKLF in the establishment of the dynamic gene expression network within erythroblastic islands in the developing embryo and provide the means for their efficient isolation.
Collapse
Affiliation(s)
- Kaustav Mukherjee
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
- Black Family Stem Cell InstituteNew York, NYUnited States
| | - Li Xue
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
| | - Antanas Planutis
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
| | - Merlin Nithya Gnanapragasam
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
| | - Andrew Chess
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
| | - James J Bieker
- Department of Cell, Developmental, and Regenerative Biology, Mount Sinai School of MedicineNew York, NYUnited States
- Black Family Stem Cell InstituteNew York, NYUnited States
- Tisch Cancer InstituteNew York, NYUnited States
- Mindich Child Health and Development Institute, Mount Sinai School of MedicineNew York, NYUnited States
| |
Collapse
|
131
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
132
|
Van Buren S, Sarkar H, Srivastava A, Rashid NU, Patro R, Love MI. Compression of quantification uncertainty for scRNA-seq counts. Bioinformatics 2021; 37:1699-1707. [PMID: 33471073 PMCID: PMC8289386 DOI: 10.1093/bioinformatics/btab001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/16/2020] [Accepted: 01/04/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation Quantification estimates of gene expression from single-cell RNA-seq (scRNA-seq) data have inherent uncertainty due to reads that map to multiple genes. Many existing scRNA-seq quantification pipelines ignore multi-mapping reads and therefore underestimate expected read counts for many genes. alevin accounts for multi-mapping reads and allows for the generation of ‘inferential replicates’, which reflect quantification uncertainty. Previous methods have shown improved performance when incorporating these replicates into statistical analyses, but storage and use of these replicates increases computation time and memory requirements. Results We demonstrate that storing only the mean and variance from a set of inferential replicates (‘compression’) is sufficient to capture gene-level quantification uncertainty, while reducing disk storage to as low as 9% of original storage, and memory usage when loading data to as low as 6%. Using these values, we generate ‘pseudo-inferential’ replicates from a negative binomial distribution and propose a general procedure for incorporating these replicates into a proposed statistical testing framework. When applying this procedure to trajectory-based differential expression analyses, we show false positives are reduced by more than a third for genes with high levels of quantification uncertainty. We additionally extend the Swish method to incorporate pseudo-inferential replicates and demonstrate improvements in computation time and memory usage without any loss in performance. Lastly, we show that discarding multi-mapping reads can result in significant underestimation of counts for functionally important genes in a real dataset. Availability and implementation makeInfReps and splitSwish are implemented in the R/Bioconductor fishpond package available at https://bioconductor.org/packages/fishpond. Analyses and simulated datasets can be found in the paper’s GitHub repo at https://github.com/skvanburen/scUncertaintyPaperCode. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Scott Van Buren
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | - Hirak Sarkar
- Department of Computer Science, University of Maryland College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland College Park, MD 20742, USA
| | - Avi Srivastava
- New York Genome Center, New York, NY 10013, USA.,Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Naim U Rashid
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.,Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland College Park, MD 20742, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| |
Collapse
|
133
|
Acosta J, Ssozi D, van Galen P. Single-Cell RNA Sequencing to Disentangle the Blood System. Arterioscler Thromb Vasc Biol 2021; 41:1012-1018. [PMID: 33441024 PMCID: PMC7901535 DOI: 10.1161/atvbaha.120.314654] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The blood system is often represented as a tree-like structure with stem cells that give rise to mature blood cell types through a series of demarcated steps. Although this representation has served as a model of hierarchical tissue organization for decades, single-cell technologies are shedding new light on the abundance of cell type intermediates and the molecular mechanisms that ensure balanced replenishment of differentiated cells. In this Brief Review, we exemplify new insights into blood cell differentiation generated by single-cell RNA sequencing, summarize considerations for the application of this technology, and highlight innovations that are leading the way to understand hematopoiesis at the resolution of single cells. Graphic Abstract: A graphic abstract is available for this article.
Collapse
Affiliation(s)
- Jean Acosta
- Division of Hematology, Brigham and Women's Hospital, Boston, MA. Department of Medicine, Harvard Medical School, Boston, MA. Broad Institute of MIT and Harvard, Cambridge, MA
| | - Daniel Ssozi
- Division of Hematology, Brigham and Women's Hospital, Boston, MA. Department of Medicine, Harvard Medical School, Boston, MA. Broad Institute of MIT and Harvard, Cambridge, MA
| | - Peter van Galen
- Division of Hematology, Brigham and Women's Hospital, Boston, MA. Department of Medicine, Harvard Medical School, Boston, MA. Broad Institute of MIT and Harvard, Cambridge, MA
| |
Collapse
|
134
|
Soneson C, Srivastava A, Patro R, Stadler MB. Preprocessing choices affect RNA velocity results for droplet scRNA-seq data. PLoS Comput Biol 2021; 17:e1008585. [PMID: 33428615 PMCID: PMC7822509 DOI: 10.1371/journal.pcbi.1008585] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 01/22/2021] [Accepted: 11/30/2020] [Indexed: 12/25/2022] Open
Abstract
Experimental single-cell approaches are becoming widely used for many purposes, including investigation of the dynamic behaviour of developing biological systems. Consequently, a large number of computational methods for extracting dynamic information from such data have been developed. One example is RNA velocity analysis, in which spliced and unspliced RNA abundances are jointly modeled in order to infer a 'direction of change' and thereby a future state for each cell in the gene expression space. Naturally, the accuracy and interpretability of the inferred RNA velocities depend crucially on the correctness of the estimated abundances. Here, we systematically compare five widely used quantification tools, in total yielding thirteen different quantification approaches, in terms of their estimates of spliced and unspliced RNA abundances in five experimental droplet scRNA-seq data sets. We show that there are substantial differences between the quantifications obtained from different tools, and identify typical genes for which such discrepancies are observed. We further show that these abundance differences propagate to the downstream analysis, and can have a large effect on estimated velocities as well as the biological interpretation. Our results highlight that abundance quantification is a crucial aspect of the RNA velocity analysis workflow, and that both the definition of the genomic features of interest and the quantification algorithm itself require careful consideration.
Collapse
Affiliation(s)
- Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Avi Srivastava
- New York Genome Center, New York, United States of America
- Center for Genomics and Systems Biology, New York University, New York, United States of America
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Michael B. Stadler
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- University of Basel, Basel, Switzerland
| |
Collapse
|
135
|
Abstract
Kidney fibrosis is the hallmark of chronic kidney disease progression; however, at present no antifibrotic therapies exist1-3. The origin, functional heterogeneity and regulation of scar-forming cells that occur during human kidney fibrosis remain poorly understood1,2,4. Here, using single-cell RNA sequencing, we profiled the transcriptomes of cells from the proximal and non-proximal tubules of healthy and fibrotic human kidneys to map the entire human kidney. This analysis enabled us to map all matrix-producing cells at high resolution, and to identify distinct subpopulations of pericytes and fibroblasts as the main cellular sources of scar-forming myofibroblasts during human kidney fibrosis. We used genetic fate-tracing, time-course single-cell RNA sequencing and ATAC-seq (assay for transposase-accessible chromatin using sequencing) experiments in mice, and spatial transcriptomics in human kidney fibrosis, to shed light on the cellular origins and differentiation of human kidney myofibroblasts and their precursors at high resolution. Finally, we used this strategy to detect potential therapeutic targets, and identified NKD2 as a myofibroblast-specific target in human kidney fibrosis.
Collapse
|
136
|
Oh Y, Yang S, Liu X, Jana S, Izaddoustdar F, Gao X, Debi R, Kim DK, Kim KH, Yang P, Kassiri Z, Lakin R, Backx PH. Transcriptomic Bioinformatic Analyses of Atria Uncover Involvement of Pathways Related to Strain and Post-translational Modification of Collagen in Increased Atrial Fibrillation Vulnerability in Intensely Exercised Mice. Front Physiol 2020; 11:605671. [PMID: 33424629 PMCID: PMC7793719 DOI: 10.3389/fphys.2020.605671] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023] Open
Abstract
Atrial Fibrillation (AF) is the most common supraventricular tachyarrhythmia that is typically associated with cardiovascular disease (CVD) and poor cardiovascular health. Paradoxically, endurance athletes are also at risk for AF. While it is well-established that persistent AF is associated with atrial fibrosis, hypertrophy and inflammation, intensely exercised mice showed similar adverse atrial changes and increased AF vulnerability, which required tumor necrosis factor (TNF) signaling, even though ventricular structure and function improved. To identify some of the molecular factors underlying the chamber-specific and TNF-dependent atrial changes induced by exercise, we performed transcriptome analyses of hearts from wild-type and TNF-knockout mice following exercise for 2 days, 2 or 6 weeks of exercise. Consistent with the central role of atrial stretch arising from elevated venous pressure in AF promotion, all 3 time points were associated with differential regulation of genes in atria linked to mechanosensing (focal adhesion kinase, integrins and cell-cell communications), extracellular matrix (ECM) and TNF pathways, with TNF appearing to play a permissive, rather than causal, role in gene changes. Importantly, mechanosensing/ECM genes were only enriched, along with tubulin- and hypertrophy-related genes after 2 days of exercise while being downregulated at 2 and 6 weeks, suggesting that early reactive strain-dependent remodeling with exercise yields to compensatory adjustments. Moreover, at the later time points, there was also downregulation of both collagen genes and genes involved in collagen turnover, a pattern mirroring aging-related fibrosis. By comparison, twofold fewer genes were differentially regulated in ventricles vs. atria, independently of TNF. Our findings reveal that exercise promotes TNF-dependent atrial transcriptome remodeling of ECM/mechanosensing pathways, consistent with increased preload and atrial stretch seen with exercise. We propose that similar preload-dependent mechanisms are responsible for atrial changes and AF in both CVD patients and athletes.
Collapse
Affiliation(s)
- Yena Oh
- Department of Biology, York University, Toronto, ON, Canada.,Department of Physiology, University of Toronto, Toronto, ON, Canada.,Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.,University of Ottawa Heart Institute, Ottawa, ON, Canada
| | - Sibao Yang
- Department of Biology, York University, Toronto, ON, Canada.,Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Xueyan Liu
- Department of Biology, York University, Toronto, ON, Canada.,Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Sayantan Jana
- Department of Physiology, Cardiovascular Research Center, University of Alberta, Edmonton, AB, Canada
| | | | - Xiaodong Gao
- Department of Biology, York University, Toronto, ON, Canada
| | - Ryan Debi
- Department of Biology, York University, Toronto, ON, Canada
| | - Dae-Kyum Kim
- Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Kyoung-Han Kim
- Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.,University of Ottawa Heart Institute, Ottawa, ON, Canada
| | - Ping Yang
- Department of Cardiology, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Zamaneh Kassiri
- Department of Physiology, Cardiovascular Research Center, University of Alberta, Edmonton, AB, Canada
| | - Robert Lakin
- Department of Biology, York University, Toronto, ON, Canada
| | - Peter H Backx
- Department of Biology, York University, Toronto, ON, Canada.,Department of Physiology, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
137
|
Zhang Z, Cui F, Wang C, Zhao L, Zou Q. Goals and approaches for each processing step for single-cell RNA sequencing data. Brief Bioinform 2020; 22:6034054. [PMID: 33316046 DOI: 10.1093/bib/bbaa314] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/10/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at the cellular level. However, due to the extremely low levels of transcripts in a single cell and technical losses during reverse transcription, gene expression at a single-cell resolution is usually noisy and highly dimensional; thus, statistical analyses of single-cell data are a challenge. Although many scRNA-seq data analysis tools are currently available, a gold standard pipeline is not available for all datasets. Therefore, a general understanding of bioinformatics and associated computational issues would facilitate the selection of appropriate tools for a given set of data. In this review, we provide an overview of the goals and most popular computational analysis tools for the quality control, normalization, imputation, feature selection and dimension reduction of scRNA-seq data.
Collapse
Affiliation(s)
- Zilong Zhang
- University of Electronic Science and Technology of China
| | | | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology
| | - Lingling Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| |
Collapse
|
138
|
Tekman M, Batut B, Ostrovsky A, Antoniewski C, Clements D, Ramirez F, Etherington GJ, Hotz HR, Scholtalbers J, Manning JR, Bellenger L, Doyle MA, Heydarian M, Huang N, Soranzo N, Moreno P, Mautner S, Papatheodorou I, Nekrutenko A, Taylor J, Blankenberg D, Backofen R, Grüning B. A single-cell RNA-sequencing training and analysis suite using the Galaxy framework. Gigascience 2020; 9:5931798. [PMID: 33079170 PMCID: PMC7574357 DOI: 10.1093/gigascience/giaa102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/30/2020] [Indexed: 11/25/2022] Open
Abstract
Background The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically driven methods needed to process and understand these ever-growing datasets. Results Here we outline several Galaxy workflows and learning resources for single-cell RNA-sequencing, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. The Galaxy reproducible bioinformatics framework provides tools, workflows, and trainings that not only enable users to perform 1-click 10x preprocessing but also empower them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. The downstream analysis supports a range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal, and clustering. The teaching resources cover concepts from computer science to cell biology. Access to all resources is provided at the singlecell.usegalaxy.eu portal. Conclusions The reproducible and training-oriented Galaxy framework provides a sustainable high-performance computing environment for users to run flexible analyses on both 10x and alternative platforms. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy community provide a means for users to learn, publish, and teach single-cell RNA-sequencing analysis.
Collapse
Affiliation(s)
- Mehmet Tekman
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Bérénice Batut
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Alexander Ostrovsky
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Christophe Antoniewski
- ARTbio, Sorbonne Université, CNRS FR 3631, Inserm US 037, Paris, France.,Institut de Biologie Paris Seine, 9 Quai Saint-Bernard Université Pierre et Marie Curie, Campus Jussieu, Bâtiments A-B-C, 75005 Paris, France
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Fidel Ramirez
- Boehringer Ingelheim International GmbH, Binger Strasse 173, 55216 Ingelheim am Rhein, Biberach, Germany
| | | | - Hans-Rudolf Hotz
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Jelle Scholtalbers
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Jonathan R Manning
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lea Bellenger
- ARTbio, Sorbonne Université, CNRS FR 3631, Inserm US 037, Paris, France
| | - Maria A Doyle
- Research Computing Facility, Peter MacCallum Cancer Centre, Melbourne, 305 Grattan Street, Victoria 3000, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria 3010, Australia
| | - Mohammad Heydarian
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Ni Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Stefan Mautner
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, NB21 Cleveland, OH 44195, USA
| | - Rolf Backofen
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Björn Grüning
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| |
Collapse
|
139
|
Li B, Gould J, Yang Y, Sarkizova S, Tabaka M, Ashenberg O, Rosen Y, Slyper M, Kowalczyk MS, Villani AC, Tickle T, Hacohen N, Rozenblatt-Rosen O, Regev A. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat Methods 2020; 17:793-798. [PMID: 32719530 PMCID: PMC7437817 DOI: 10.1038/s41592-020-0905-x] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 06/18/2020] [Indexed: 11/10/2022]
Abstract
Massively parallel single-cell and single-nucleus RNA sequencing has opened the way to systematic tissue atlases in health and disease, but as the scale of data generation is growing, so is the need for computational pipelines for scaled analysis. Here we developed Cumulus-a cloud-based framework for analyzing large-scale single-cell and single-nucleus RNA sequencing datasets. Cumulus combines the power of cloud computing with improvements in algorithm and implementation to achieve high scalability, low cost, user-friendliness and integrated support for a comprehensive set of features. We benchmark Cumulus on the Human Cell Atlas Census of Immune Cells dataset of bone marrow cells and show that it substantially improves efficiency over conventional frameworks, while maintaining or improving the quality of results, enabling large-scale studies.
Collapse
Affiliation(s)
- Bo Li
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Division of Rheumatology, Allergy, and Immunology, Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Joshua Gould
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yiming Yang
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Rheumatology, Allergy, and Immunology, Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA, USA
| | - Siranush Sarkizova
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marcin Tabaka
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Orr Ashenberg
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yanay Rosen
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Michal Slyper
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Monika S Kowalczyk
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Alexandra-Chloé Villani
- Division of Rheumatology, Allergy, and Immunology, Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | - Timothy Tickle
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Nir Hacohen
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | | | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
140
|
Hie B, Peters J, Nyquist SK, Shalek AK, Berger B, Bryson BD. Computational Methods for Single-Cell RNA Sequencing. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012220-100601] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has provided a high-dimensional catalog of millions of cells across species and diseases. These data have spurred the development of hundreds of computational tools to derive novel biological insights. Here, we outline the components of scRNA-seq analytical pipelines and the computational methods that underlie these steps. We describe available methods, highlight well-executed benchmarking studies, and identify opportunities for additional benchmarking studies and computational methods. As the biochemical approaches for single-cell omics advance, we propose coupled development of robust analytical pipelines suited for the challenges that new data present and principled selection of analytical methods that are suited for the biological questions to be addressed.
Collapse
Affiliation(s)
- Brian Hie
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Joshua Peters
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| | - Sarah K. Nyquist
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Alex K. Shalek
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Department of Chemistry, Institute for Medical Engineering & Science (IMES), and Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bryan D. Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
141
|
Niebler S, Müller A, Hankeln T, Schmidt B. RainDrop: Rapid activation matrix computation for droplet-based single-cell RNA-seq reads. BMC Bioinformatics 2020; 21:274. [PMID: 32611394 PMCID: PMC7329424 DOI: 10.1186/s12859-020-03593-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 06/09/2020] [Indexed: 12/19/2022] Open
Abstract
Background Obtaining data from single-cell transcriptomic sequencing allows for the investigation of cell-specific gene expression patterns, which could not be addressed a few years ago. With the advancement of droplet-based protocols the number of studied cells continues to increase rapidly. This establishes the need for software tools for efficient processing of the produced large-scale datasets. We address this need by presenting RainDrop for fast gene-cell count matrix computation from single-cell RNA-seq data produced by 10x Genomics Chromium technology. Results RainDrop can process single-cell transcriptomic datasets consisting of 784 million reads sequenced from around 8.000 cells in less than 40 minutes on a standard workstation. It significantly outperforms the established Cell Ranger pipeline and the recently introduced Alevin tool in terms of runtime by a maximal (average) speedup of 30.4 (22.6) and 3.5 (2.4), respectively, while keeping high agreements of the generated results. Conclusions RainDrop is a software tool for highly efficient processing of large-scale droplet-based single-cell RNA-seq datasets on standard workstations written in C++. It is available at https://gitlab.rlp.net/stnieble/raindrop.
Collapse
Affiliation(s)
- Stefan Niebler
- Department of Computer Science, Johannes Gutenberg University, Mainz, 55099, Germany
| | - André Müller
- Department of Computer Science, Johannes Gutenberg University, Mainz, 55099, Germany
| | - Thomas Hankeln
- Molecular Genetics and Genome Analysis, Institute of Organismal and Molecular Evolution, Johannes Gutenberg University, Mainz, 55099, Germany
| | - Bertil Schmidt
- Department of Computer Science, Johannes Gutenberg University, Mainz, 55099, Germany.
| |
Collapse
|
142
|
Srivastava A, Malik L, Sarkar H, Patro R. A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification. Bioinformatics 2020; 36:i292-i299. [PMID: 32657394 PMCID: PMC7355277 DOI: 10.1093/bioinformatics/btaa450] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation Droplet-based single-cell RNA-seq (dscRNA-seq) data are being generated at an unprecedented pace, and the accurate estimation of gene-level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When pre-processing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscRNA-seq data, and the strong 3’ sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes. Results We introduce a Bayesian framework for information sharing across cells within a sample, or across multiple modalities of data using the same sample, to improve gene quantification estimates for dscRNA-seq data. We use an anchor-based approach to connect cells with similar gene-expression patterns, and learn informative, empirical priors which we provide to alevin’s gene multi-mapping resolution algorithm. This improves the quantification estimates for genes with no uniquely mapping reads (i.e. when there is no unique intra-cellular information). We show our new model improves the per cell gene-level estimates and provides a principled framework for information sharing across multiple modalities. We test our method on a combination of simulated and real datasets under various setups. Availability and implementation The information sharing model is included in alevin and is implemented in C++14. It is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/salmon as of version 1.1.0.
Collapse
Affiliation(s)
- Avi Srivastava
- Department of Computer Science, Stony Brook University, Stony Brook 11794, NY, USA
| | - Laraib Malik
- Department of Computer Science, Stony Brook University, Stony Brook 11794, NY, USA
| | - Hirak Sarkar
- Computer Science Department, University of Maryland, College Park 20742, MD, USA
| | - Rob Patro
- Computer Science Department, University of Maryland, College Park 20742, MD, USA
| |
Collapse
|
143
|
Qian H, Kang X, Hu J, Zhang D, Liang Z, Meng F, Zhang X, Xue Y, Maimon R, Dowdy SF, Devaraj NK, Zhou Z, Mobley WC, Cleveland DW, Fu XD. Reversing a model of Parkinson's disease with in situ converted nigral neurons. Nature 2020; 582:550-556. [PMID: 32581380 PMCID: PMC7521455 DOI: 10.1038/s41586-020-2388-4] [Citation(s) in RCA: 343] [Impact Index Per Article: 68.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 05/13/2020] [Indexed: 12/21/2022]
Abstract
Parkinson's disease is characterized by loss of dopamine neurons in the substantia nigra1. Similar to other major neurodegenerative disorders, there are no disease-modifying treatments for Parkinson's disease. While most treatment strategies aim to prevent neuronal loss or protect vulnerable neuronal circuits, a potential alternative is to replace lost neurons to reconstruct disrupted circuits2. Here we report an efficient one-step conversion of isolated mouse and human astrocytes to functional neurons by depleting the RNA-binding protein PTB (also known as PTBP1). Applying this approach to the mouse brain, we demonstrate progressive conversion of astrocytes to new neurons that innervate into and repopulate endogenous neural circuits. Astrocytes from different brain regions are converted to different neuronal subtypes. Using a chemically induced model of Parkinson's disease in mouse, we show conversion of midbrain astrocytes to dopaminergic neurons, which provide axons to reconstruct the nigrostriatal circuit. Notably, re-innervation of striatum is accompanied by restoration of dopamine levels and rescue of motor deficits. A similar reversal of disease phenotype is also accomplished by converting astrocytes to neurons using antisense oligonucleotides to transiently suppress PTB. These findings identify a potentially powerful and clinically feasible approach to treating neurodegeneration by replacing lost neurons.
Collapse
Affiliation(s)
- Hao Qian
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Xinjiang Kang
- State Key Laboratory of Membrane Biology and Peking-Tsinghua Center for Life Sciences, Institute of Molecular Medicine, Peking University, Beijing, China.,MOE Key Lab of Medical Electrophysiology, ICR, Southwest Medical University, Luzhou, China
| | - Jing Hu
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Sichuan Provincial Key Laboratory for Human Disease Gene Study, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - Dongyang Zhang
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Zhengyu Liang
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Fan Meng
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Xuan Zhang
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Yuanchao Xue
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Roy Maimon
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Steven F Dowdy
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Neal K Devaraj
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA
| | - Zhuan Zhou
- State Key Laboratory of Membrane Biology and Peking-Tsinghua Center for Life Sciences, Institute of Molecular Medicine, Peking University, Beijing, China
| | - William C Mobley
- Department of Neurosciences and Center for Neural Circuits and Behavior, University of California, San Diego, La Jolla, CA, USA
| | - Don W Cleveland
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Xiang-Dong Fu
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA. .,Institute of Genomic Medicine, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
144
|
Van de Sande B, Flerin C, Davie K, De Waegeneer M, Hulselmans G, Aibar S, Seurinck R, Saelens W, Cannoodt R, Rouchon Q, Verbeiren T, De Maeyer D, Reumers J, Saeys Y, Aerts S. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc 2020; 15:2247-2276. [PMID: 32561888 DOI: 10.1038/s41596-020-0336-2] [Citation(s) in RCA: 700] [Impact Index Per Article: 140.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/17/2020] [Indexed: 11/09/2022]
Abstract
This protocol explains how to perform a fast SCENIC analysis alongside standard best practices steps on single-cell RNA-sequencing data using software containers and Nextflow pipelines. SCENIC reconstructs regulons (i.e., transcription factors and their target genes) assesses the activity of these discovered regulons in individual cells and uses these cellular activity patterns to find meaningful clusters of cells. Here we present an improved version of SCENIC with several advances. SCENIC has been refactored and reimplemented in Python (pySCENIC), resulting in a tenfold increase in speed, and has been packaged into containers for ease of use. It is now also possible to use epigenomic track databases, as well as motifs, to refine regulons. In this protocol, we explain the different steps of SCENIC: the workflow starts from the count matrix depicting the gene abundances for all cells and consists of three stages. First, coexpression modules are inferred using a regression per-target approach (GRNBoost2). Next, the indirect targets are pruned from these modules using cis-regulatory motif discovery (cisTarget). Lastly, the activity of these regulons is quantified via an enrichment score for the regulon's target genes (AUCell). Nonlinear projection methods can be used to display visual groupings of cells based on the cellular activity patterns of these regulons. The results can be exported as a loom file and visualized in the SCope web application. This protocol is illustrated on two use cases: a peripheral blood mononuclear cell data set and a panel of single-cell RNA-sequencing cancer experiments. For a data set of 10,000 genes and 50,000 cells, the pipeline runs in <2 h.
Collapse
Affiliation(s)
- Bram Van de Sande
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Christopher Flerin
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Kristofer Davie
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium
| | - Maxime De Waegeneer
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Sara Aibar
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Ruth Seurinck
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Wouter Saelens
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Robrecht Cannoodt
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
| | - Quentin Rouchon
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Toni Verbeiren
- Janssen Pharmaceutica, Beerse, Belgium.,Data Intuitive, Ghent, Belgium
| | | | | | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Stein Aerts
- VIB Center for Brain & Disease Research, KU Leuven, Leuven, Belgium. .,Department of Human Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
145
|
Abstract
Background: Analysis of scATAC-seq data has been recently scaled to thousands of cells. While processing of other types of single cell data was boosted by the implementation of alignment-free techniques, pipelines available to process scATAC-seq data still require large computational resources. We propose here an approach based on pseudoalignment, which reduces the execution times and hardware needs at little cost for precision. Methods: Public data for 10k PBMC were downloaded from 10x Genomics web site. Reads were aligned to various references derived from DNase I Hypersensitive Sites (DHS) using kallisto and quantified with bustools. We compared our results with the ones publicly available derived by cellranger-atac. We subsequently tested our approach on scATAC-seq data for K562 cell line. Results: We found that kallisto does not introduce biases in quantification of known peaks; cells groups identified are consistent with the ones identified from standard method. We also found that cell identification is robust when analysis is performed using DHS-derived reference in place of de novo identification of ATAC peaks. Lastly, we found that our approach is suitable for reliable quantification of gene activity based on scATAC-seq signal, thus allows for efficient labelling of cell groups based on marker genes. Conclusions: Analysis of scATAC-seq data by means of kallisto produces results in line with standard pipelines while being considerably faster; using a set of known DHS sites as reference does not affect the ability to characterize the cell populations.
Collapse
Affiliation(s)
- Valentina Giansanti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- Center for Omics Sciences, IRCCS San Raffaele Institute, Milan, Italy
| | - Ming Tang
- FAS informatics, Harvard University, Cambridge, MA, USA
| | - Davide Cittaro
- Center for Omics Sciences, IRCCS San Raffaele Institute, Milan, Italy
| |
Collapse
|
146
|
Love MI, Soneson C, Hickey PF, Johnson LK, Pierce NT, Shepherd L, Morgan M, Patro R. Tximeta: Reference sequence checksums for provenance identification in RNA-seq. PLoS Comput Biol 2020; 16:e1007664. [PMID: 32097405 PMCID: PMC7059966 DOI: 10.1371/journal.pcbi.1007664] [Citation(s) in RCA: 193] [Impact Index Per Article: 38.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 03/06/2020] [Accepted: 01/18/2020] [Indexed: 11/19/2022] Open
Abstract
Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at https://bioconductor.org/packages/tximeta.
Collapse
Affiliation(s)
- Michael I. Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Peter F. Hickey
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- The Department of Medical Biology, University of Melbourne, Parkville, Victoria, Australia
| | - Lisa K. Johnson
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, United States of America
| | - N. Tessa Pierce
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, United States of America
| | - Lori Shepherd
- Roswell Park Comprehensive Cancer Center, Buffalo, New York, United States of America
| | - Martin Morgan
- Roswell Park Comprehensive Cancer Center, Buffalo, New York, United States of America
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
147
|
Amezquita RA, Lun ATL, Becht E, Carey VJ, Carpp LN, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pagès H, Smith ML, Huber W, Morgan M, Gottardo R, Hicks SC. Orchestrating single-cell analysis with Bioconductor. Nat Methods 2020; 17:137-145. [PMID: 31792435 PMCID: PMC7358058 DOI: 10.1038/s41592-019-0654-x] [Citation(s) in RCA: 498] [Impact Index Per Article: 99.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 09/13/2019] [Accepted: 10/14/2019] [Indexed: 12/24/2022]
Abstract
Recent technological advancements have enabled the profiling of a large number of genome-wide features in individual cells. However, single-cell data present unique challenges that require the development of specialized methods and software infrastructure to successfully derive biological insights. The Bioconductor project has rapidly grown to meet these demands, hosting community-developed open-source software distributed as R packages. Featuring state-of-the-art computational methods, standardized data infrastructure and interactive data visualization tools, we present an overview and online book (https://osca.bioconductor.org) of single-cell methods for prospective users.
Collapse
Affiliation(s)
| | - Aaron T L Lun
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Bioinformatics and Computational Biology, Genentech Inc., San Francisco, CA, USA
| | - Etienne Becht
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Vince J Carey
- Channing Division of Network Medicine, Brigham And Women's Hospital, Boston, MA, USA
| | | | - Ludwig Geistlinger
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY, USA
- Institute for Implementation Science in Population Health, City University of New York, New York, NY, USA
| | - Federico Marini
- Center for Thrombosis and Hemostasis, Mainz, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics, Mainz, Germany
| | | | - Davide Risso
- Department of Statistical Sciences, University of Padua, Padua, Italy
- Division of Biostatistics and Epidemiology, Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY, USA
- Institute for Implementation Science in Population Health, City University of New York, New York, NY, USA
| | - Hervé Pagès
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Mike L Smith
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Wolfgang Huber
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Martin Morgan
- Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | | | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
148
|
Papatheodorou I, Moreno P, Manning J, Fuentes AMP, George N, Fexova S, Fonseca NA, Füllgrabe A, Green M, Huang N, Huerta L, Iqbal H, Jianu M, Mohammed S, Zhao L, Jarnuczak AF, Jupp S, Marioni J, Meyer K, Petryszak R, Prada Medina CA, Talavera-López C, Teichmann S, Vizcaino JA, Brazma A. Expression Atlas update: from tissues to single cells. Nucleic Acids Res 2020; 48:D77-D83. [PMID: 31665515 PMCID: PMC7145605 DOI: 10.1093/nar/gkz947] [Citation(s) in RCA: 226] [Impact Index Per Article: 45.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/07/2019] [Accepted: 10/16/2019] [Indexed: 12/16/2022] Open
Abstract
Expression Atlas is EMBL-EBI's resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.
Collapse
Affiliation(s)
- Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Jonathan Manning
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | - Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Silvie Fexova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Nuno A Fonseca
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Anja Füllgrabe
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Matthew Green
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Ni Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Laura Huerta
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Haider Iqbal
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Monica Jianu
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Suhaib Mohammed
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Lingyun Zhao
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Andrew F Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - John Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Kerstin Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Robert Petryszak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | | | - Sarah Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Juan Antonio Vizcaino
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| |
Collapse
|
149
|
Liu D. Algorithms for efficiently collapsing reads with Unique Molecular Identifiers. PeerJ 2019; 7:e8275. [PMID: 31871845 PMCID: PMC6921982 DOI: 10.7717/peerj.8275] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 11/22/2019] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Unique Molecular Identifiers (UMI) are used in many experiments to find and remove PCR duplicates. There are many tools for solving the problem of deduplicating reads based on their finding reads with the same alignment coordinates and UMIs. However, many tools either cannot handle substitution errors, or require expensive pairwise UMI comparisons that do not efficiently scale to larger datasets. RESULTS We reformulate the problem of deduplicating UMIs in a manner that enables optimizations to be made, and more efficient data structures to be used. We implement our data structures and optimizations in a tool called UMICollapse, which is able to deduplicate over one million unique UMIs of length 9 at a single alignment position in around 26 s, using only a single thread and much less than 10 GB of memory. CONCLUSIONS We present a new formulation of the UMI deduplication problem, and show that it can be solved faster, with more sophisticated data structures.
Collapse
Affiliation(s)
- Daniel Liu
- Torrey Pines High School, San Diego, CA, United States of America
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, United States of America
| |
Collapse
|
150
|
Zhu A, Srivastava A, Ibrahim JG, Patro R, Love MI. Nonparametric expression analysis using inferential replicate counts. Nucleic Acids Res 2019; 47:e105. [PMID: 31372651 PMCID: PMC6765120 DOI: 10.1093/nar/gkz622] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 06/11/2019] [Accepted: 07/11/2019] [Indexed: 11/13/2022] Open
Abstract
A primary challenge in the analysis of RNA-seq data is to identify differentially expressed genes or transcripts while controlling for technical biases. Ideally, a statistical testing procedure should incorporate the inherent uncertainty of the abundance estimates arising from the quantification step. Most popular methods for RNA-seq differential expression analysis fit a parametric model to the counts for each gene or transcript, and a subset of methods can incorporate uncertainty. Previous work has shown that nonparametric models for RNA-seq differential expression may have better control of the false discovery rate, and adapt well to new data types without requiring reformulation of a parametric model. Existing nonparametric models do not take into account inferential uncertainty, leading to an inflated false discovery rate, in particular at the transcript level. We propose a nonparametric model for differential expression analysis using inferential replicate counts, extending the existing SAMseq method to account for inferential uncertainty. We compare our method, Swish, with popular differential expression analysis methods. Swish has improved control of the false discovery rate, in particular for transcripts with high inferential uncertainty. We apply Swish to a single-cell RNA-seq dataset, assessing differential expression between sub-populations of cells, and compare its performance to the Wilcoxon test.
Collapse
Affiliation(s)
- Anqi Zhu
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
| | - Avi Srivastava
- Department of Computer Science, Stony Brook University, Computer Science Building, Engineering Dr, Stony Brook, NY 11794, USA
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Computer Science Building, Engineering Dr, Stony Brook, NY 11794, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina-Chapel Hill, 120 Mason Farm Rd, Chapel Hill, NC 27514, USA
| |
Collapse
|