1
|
Coombes B, Lux T, Akhunov E, Hall A. Introgressions lead to reference bias in wheat RNA-seq analysis. BMC Biol 2024; 22:56. [PMID: 38454464 PMCID: PMC10921782 DOI: 10.1186/s12915-024-01853-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 02/21/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND RNA-seq is a fundamental technique in genomics, yet reference bias, where transcripts derived from non-reference alleles are quantified less accurately, can undermine the accuracy of RNA-seq quantification and thus the conclusions made downstream. Reference bias in RNA-seq analysis has yet to be explored in complex polyploid genomes despite evidence that they are often a complex mosaic of wild relative introgressions, which introduce blocks of highly divergent genes. RESULTS Here we use hexaploid wheat as a model complex polyploid, using both simulated and experimental data to show that RNA-seq alignment in wheat suffers from widespread reference bias which is largely driven by divergent introgressed genes. This leads to underestimation of gene expression and incorrect assessment of homoeologue expression balance. By incorporating gene models from ten wheat genome assemblies into a pantranscriptome reference, we present a novel method to reduce reference bias, which can be readily scaled to capture more variation as new genome and transcriptome data becomes available. CONCLUSIONS This study shows that the presence of introgressions can lead to reference bias in wheat RNA-seq analysis. Caution should be exercised by researchers using non-sample reference genomes for RNA-seq alignment and novel methods, such as the one presented here, should be considered.
Collapse
Affiliation(s)
| | - Thomas Lux
- Plant Genome and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Eduard Akhunov
- Department of Plant Pathology, Kansas State University, Manhattan, KS, USA
| | - Anthony Hall
- Earlham Institute, Norwich, Norfolk, NR4 7UZ, UK.
| |
Collapse
|
2
|
Lyu A, Humphrey RS, Nam SH, Durham TA, Hu Z, Arasappan D, Horton TM, Ehrlich LIR. Integrin signaling is critical for myeloid-mediated support of T-cell acute lymphoblastic leukemia. Nat Commun 2023; 14:6270. [PMID: 37805579 PMCID: PMC10560206 DOI: 10.1038/s41467-023-41925-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 09/21/2023] [Indexed: 10/09/2023] Open
Abstract
We previously found that T-cell acute lymphoblastic leukemia (T-ALL) requires support from tumor-associated myeloid cells, which activate Insulin Like Growth Factor 1 Receptor (IGF1R) signaling in leukemic blasts. However, IGF1 is not sufficient to sustain T-ALL in vitro, implicating additional myeloid-mediated signals in leukemia progression. Here, we find that T-ALL cells require close contact with myeloid cells to survive. Transcriptional profiling and in vitro assays demonstrate that integrin-mediated cell adhesion activates downstream focal adhesion kinase (FAK)/ proline-rich tyrosine kinase 2 (PYK2), which are required for myeloid-mediated T-ALL support, partly through activation of IGF1R. Blocking integrin ligands or inhibiting FAK/PYK2 signaling diminishes leukemia burden in multiple organs and confers a survival advantage in a mouse model of T-ALL. Inhibiting integrin-mediated adhesion or FAK/PYK2 also reduces survival of primary patient T-ALL cells co-cultured with myeloid cells. Furthermore, elevated integrin pathway gene signatures correlate with higher FAK signaling and myeloid gene signatures and are associated with an inferior prognosis in pediatric T-ALL patients. Together, these findings demonstrate that integrin activation and downstream FAK/PYK2 signaling are important mechanisms underlying myeloid-mediated support of T-ALL progression.
Collapse
Affiliation(s)
- Aram Lyu
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA
| | - Ryan S Humphrey
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA
| | - Seo Hee Nam
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA
| | - Tyler A Durham
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA
| | - Zicheng Hu
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Dhivya Arasappan
- Center for Biomedical Research Support, The University of Texas at Austin, Austin, TX, USA
| | - Terzah M Horton
- Department of Pediatrics, Baylor College of Medicine/Dan L. Duncan Cancer Center and Texas Children's Cancer Center, Houston, TX, USA
| | - Lauren I R Ehrlich
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, USA.
- Department of Oncology, Livestrong Cancer Institutes, The University of Texas at Austin Dell Medical School, Austin, TX, USA.
| |
Collapse
|
3
|
Stokes T, Cen HH, Kapranov P, Gallagher IJ, Pitsillides AA, Volmar C, Kraus WE, Johnson JD, Phillips SM, Wahlestedt C, Timmons JA. Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq. ADVANCED GENETICS (HOBOKEN, N.J.) 2023; 4:2200024. [PMID: 37288167 PMCID: PMC10242409 DOI: 10.1002/ggn2.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 06/09/2023]
Abstract
Sequencing the human genome empowers translational medicine, facilitating transcriptome-wide molecular diagnosis, pathway biology, and drug repositioning. Initially, microarrays are used to study the bulk transcriptome; but now short-read RNA sequencing (RNA-seq) predominates. Positioned as a superior technology, that makes the discovery of novel transcripts routine, most RNA-seq analyses are in fact modeled on the known transcriptome. Limitations of the RNA-seq methodology have emerged, while the design of, and the analysis strategies applied to, arrays have matured. An equitable comparison between these technologies is provided, highlighting advantages that modern arrays hold over RNA-seq. Array protocols more accurately quantify constitutively expressed protein coding genes across tissue replicates, and are more reliable for studying lower expressed genes. Arrays reveal long noncoding RNAs (lncRNA) are neither sparsely nor lower expressed than protein coding genes. Heterogeneous coverage of constitutively expressed genes observed with RNA-seq, undermines the validity and reproducibility of pathway analyses. The factors driving these observations, many of which are relevant to long-read or single-cell sequencing are discussed. As proposed herein, a reappreciation of bulk transcriptomic methods is required, including wider use of the modern high-density array data-to urgently revise existing anatomical RNA reference atlases and assist with more accurate study of lncRNAs.
Collapse
Affiliation(s)
- Tanner Stokes
- Faculty of ScienceMcMaster UniversityHamiltonL8S 4L8Canada
| | - Haoning Howard Cen
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | - Iain J Gallagher
- School of Applied SciencesEdinburgh Napier UniversityEdinburghEH11 4BNUK
| | | | | | | | - James D. Johnson
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | | | - James A. Timmons
- Miller School of MedicineUniversity of MiamiMiamiFL33136USA
- William Harvey Research InstituteQueen Mary University LondonLondonEC1M 6BQUK
- Augur Precision Medicine LTDStirlingFK9 5NFUK
| |
Collapse
|
4
|
De Bastiani MA, Bellaver B, Brum WS, Souza DG, Ferreira PCL, Rocha AS, Povala G, Ferrari-Souza JP, Benedet AL, Ashton NJ, Karikari TK, Zetterberg H, Blennow K, Rosa-Neto P, Pascoal TA, Zimmer ER. Hippocampal GFAP-positive astrocyte responses to amyloid and tau pathologies. Brain Behav Immun 2023; 110:175-184. [PMID: 36878332 DOI: 10.1016/j.bbi.2023.03.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 01/10/2023] [Accepted: 03/01/2023] [Indexed: 03/08/2023] Open
Abstract
INTRODUCTION In Alzheimer's disease clinical research, glial fibrillary acidic protein (GFAP) released/leaked into the cerebrospinal fluid and blood is widely measured and perceived as a biomarker of reactive astrogliosis. However, it was demonstrated that GFAP levels differ in individuals presenting with amyloid-β (Aβ) or tau pathologies. The molecular underpinnings behind this specificity are little explored. Here we investigated biomarker and transcriptomic associations of hippocampal GFAP-positive astrocytes with Aβ and tau pathologies in humans and mouse models. METHODS We studied 90 individuals with plasma GFAP, Aβ- and Tau-PET to investigate the association between biomarkers. Then, transcriptomic analysis in hippocampal GFAP-positive astrocytes isolated from mouse models presenting Aβ (PS2APP) or tau (P301S) pathologies was conducted to explore differentially expressed genes (DEGs), Gene Ontology terms, and protein-protein interaction networks associated with each phenotype. RESULTS In humans, we found that plasma GFAP associates with Aβ but not tau pathology. Unveiling the unique nature of hippocampal GFAP-positive astrocytic responses to Aβ or tau pathologies, mouse transcriptomics showed scarce overlap of DEGs between the Aβ. and tau mouse models. While Aβ GFAP-positive astrocytes were overrepresented with DEGs associated with proteostasis and exocytosis-related processes, tau hippocampal GFAP-positive astrocytes presented greater abnormalities in functions related to DNA/RNA processing and cytoskeleton dynamics. CONCLUSION Our results offer insights into Aβ- and tau-driven specific signatures in hippocampal GFAP-positive astrocytes. Characterizing how different underlying pathologies distinctly influence astrocyte responses is critical for the biological interpretation of astrocyte biomarkers and suggests the need to develop context-specific astrocyte targets to study AD. FUNDING This study was supported by Instituto Serrapilheira, Alzheimer's Association, CAPES, CNPq and FAPERGS.
Collapse
Affiliation(s)
- Marco Antônio De Bastiani
- Graduate Program in Biological Sciences: Biochemistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Bruna Bellaver
- Graduate Program in Biological Sciences: Biochemistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wagner S Brum
- Graduate Program in Biological Sciences: Biochemistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Department of Psychiatry and Neurochemistry, The Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden
| | - Debora G Souza
- Graduate Program in Biological Sciences: Biochemistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Brain Institute of Rio Grande do Sul, PUCRS, Porto Alegre, RS, Brazil
| | | | - Andreia S Rocha
- Graduate Program in Biological Sciences: Biochemistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Guilherme Povala
- Graduate Program in Biological Sciences: Biochemistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - João Pedro Ferrari-Souza
- Graduate Program in Biological Sciences: Biochemistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - Andrea L Benedet
- Department of Psychiatry and Neurochemistry, The Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden
| | - Nicholas J Ashton
- Department of Psychiatry and Neurochemistry, The Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden; Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden; Department of Old Age Psychiatry, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Thomas K Karikari
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA; Department of Psychiatry and Neurochemistry, The Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden; Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Henrik Zetterberg
- Department of Psychiatry and Neurochemistry, The Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden; Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Gothenburg, Sweden; Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK; UK Dementia Research Institute at UCL, London, UK; Hong Kong Center for Neurodegenerative Diseases, Hong Kong, China
| | - Kaj Blennow
- Department of Psychiatry and Neurochemistry, The Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden; Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Pedro Rosa-Neto
- Translational Neuroimaging Laboratory (TNL), McGill Center for Studies in Aging (MCSA), Douglas Mental Health University Institute, Departments of Neurology and Neurosurgery, Psychiatry, and Pharmacology, McGill University, Montreal, Canada
| | - Tharick A Pascoal
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - Eduardo R Zimmer
- Graduate Program in Biological Sciences: Biochemistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Department of Pharmacology, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Graduate Program in Biological Sciences: Pharmacology and Therapeutics, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Brain Institute of Rio Grande do Sul, PUCRS, Porto Alegre, RS, Brazil.
| |
Collapse
|
5
|
Cui H, Diedrich JK, Wu DC, Lim JJ, Nottingham RM, Moresco JJ, Yates JR, Blencowe BJ, Lambowitz AM, Schimmel P. Arg-tRNA synthetase links inflammatory metabolism to RNA splicing and nuclear trafficking via SRRM2. Nat Cell Biol 2023; 25:592-603. [PMID: 37059883 DOI: 10.1038/s41556-023-01118-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 02/27/2023] [Indexed: 04/16/2023]
Abstract
Cells respond to perturbations such as inflammation by sensing changes in metabolite levels. Especially prominent is arginine, which has known connections to the inflammatory response. Aminoacyl-tRNA synthetases, enzymes that catalyse the first step of protein synthesis, can also mediate cell signalling. Here we show that depletion of arginine during inflammation decreased levels of nuclear-localized arginyl-tRNA synthetase (ArgRS). Surprisingly, we found that nuclear ArgRS interacts and co-localizes with serine/arginine repetitive matrix protein 2 (SRRM2), a spliceosomal and nuclear speckle protein, and that decreased levels of nuclear ArgRS correlated with changes in condensate-like nuclear trafficking of SRRM2 and splice-site usage in certain genes. These splice-site usage changes cumulated in the synthesis of different protein isoforms that altered cellular metabolism and peptide presentation to immune cells. Our findings uncover a mechanism whereby an aminoacyl-tRNA synthetase cognate to a key amino acid that is metabolically controlled during inflammation modulates the splicing machinery.
Collapse
Affiliation(s)
- Haissi Cui
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - Jolene K Diedrich
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - Douglas C Wu
- Institute for Cellular and Molecular Biology and Departments of Molecular Biosciences and Oncology, University of Texas at Austin, Austin, TX, USA
| | - Justin J Lim
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Ryan M Nottingham
- Institute for Cellular and Molecular Biology and Departments of Molecular Biosciences and Oncology, University of Texas at Austin, Austin, TX, USA
| | - James J Moresco
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, USA
- Center for the Genetics of Host Defense, UT Southwestern Medical Center, Dallas, TX, USA
| | - John R Yates
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - Benjamin J Blencowe
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Alan M Lambowitz
- Institute for Cellular and Molecular Biology and Departments of Molecular Biosciences and Oncology, University of Texas at Austin, Austin, TX, USA.
| | - Paul Schimmel
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, USA.
| |
Collapse
|
6
|
Chen JW, Shrestha L, Green G, Leier A, Marquez-Lago TT. The hitchhikers' guide to RNA sequencing and functional analysis. Brief Bioinform 2023; 24:bbac529. [PMID: 36617463 PMCID: PMC9851315 DOI: 10.1093/bib/bbac529] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/18/2022] [Accepted: 11/07/2022] [Indexed: 01/10/2023] Open
Abstract
DNA and RNA sequencing technologies have revolutionized biology and biomedical sciences, sequencing full genomes and transcriptomes at very high speeds and reasonably low costs. RNA sequencing (RNA-Seq) enables transcript identification and quantification, but once sequencing has concluded researchers can be easily overwhelmed with questions such as how to go from raw data to differential expression (DE), pathway analysis and interpretation. Several pipelines and procedures have been developed to this effect. Even though there is no unique way to perform RNA-Seq analysis, it usually follows these steps: 1) raw reads quality check, 2) alignment of reads to a reference genome, 3) aligned reads' summarization according to an annotation file, 4) DE analysis and 5) gene set analysis and/or functional enrichment analysis. Each step requires researchers to make decisions, and the wide variety of options and resulting large volumes of data often lead to interpretation challenges. There also seems to be insufficient guidance on how best to obtain relevant information and derive actionable knowledge from transcription experiments. In this paper, we explain RNA-Seq steps in detail and outline differences and similarities of different popular options, as well as advantages and disadvantages. We also discuss non-coding RNA analysis, multi-omics, meta-transcriptomics and the use of artificial intelligence methods complementing the arsenal of tools available to researchers. Lastly, we perform a complete analysis from raw reads to DE and functional enrichment analysis, visually illustrating how results are not absolute truths and how algorithmic decisions can greatly impact results and interpretation.
Collapse
Affiliation(s)
- Jiung-Wen Chen
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Lisa Shrestha
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - George Green
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - André Leier
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
- Department of Microbiology, University of Alabama at Birmingham, School of Medicine, Birmingham, AL, USA
| |
Collapse
|
7
|
Fang S, Chen B, Zhang Y, Sun H, Liu L, Liu S, Li Y, Xu X. Computational Approaches and Challenges in Spatial Transcriptomics. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00129-2. [PMID: 36252814 PMCID: PMC10372921 DOI: 10.1016/j.gpb.2022.10.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 09/08/2022] [Accepted: 10/09/2022] [Indexed: 01/19/2023]
Abstract
The development of spatial transcriptomics (ST) technologies has transformed genetic research from a single-cell data level to a two-dimensional spatial coordinate system and facilitated the study of the composition and function of various cell subsets in different environments and organs. The large-scale data generated by these ST technologies, which contain spatial gene expression information, have elicited the need for spatially resolved approaches to meet the requirements of computational and biological data interpretation. These requirements include dealing with the explosive growth of data to determine the cell-level and gene-level expression, correcting the inner batch effect and loss of expression to improve the data quality, conducting efficient interpretation and in-depth knowledge mining both at the single-cell and tissue-wide levels, and conducting multi-omics integration analysis to provide an extensible framework toward the in-depth understanding of biological processes. However, algorithms designed specifically for ST technologies to meet these requirements are still in their infancy. Here, we review computational approaches to these problems in light of corresponding issues and challenges, and present forward-looking insights into algorithm development.
Collapse
|
8
|
Kubota N, Suyama M. Mapping of promoter usage QTL using RNA-seq data reveals their contributions to complex traits. PLoS Comput Biol 2022; 18:e1010436. [PMID: 36037215 PMCID: PMC9462676 DOI: 10.1371/journal.pcbi.1010436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/09/2022] [Accepted: 07/25/2022] [Indexed: 11/18/2022] Open
Abstract
Genomic variations are associated with gene expression levels, which are called expression quantitative trait loci (eQTL). Most eQTL may affect the total gene expression levels by regulating transcriptional activities of a specific promoter. However, the direct exploration of genomic loci associated with promoter activities using RNA-seq data has been challenging because eQTL analyses treat the total expression levels estimated by summing those of all isoforms transcribed from distinct promoters. Here we propose a new method for identifying genomic loci associated with promoter activities, called promoter usage quantitative trait loci (puQTL), using conventional RNA-seq data. By leveraging public RNA-seq datasets from the lymphoblastoid cell lines of 438 individuals from the GEUVADIS project, we obtained promoter activity estimates and mapped 2,592 puQTL at the 10% FDR level. The results of puQTL mapping enabled us to interpret the manner in which genomic variations regulate gene expression. We found that 310 puQTL genes (16.1%) were not detected by eQTL analysis, suggesting that our pipeline can identify novel variant–gene associations. Furthermore, we identified genomic loci associated with the activity of “hidden” promoters, which the standard eQTL studies have ignored. We found that most puQTL signals were concordant with at least one genome-wide association study (GWAS) signal, enabling novel interpretations of the molecular mechanisms of complex traits. Our results emphasize the importance of the re-analysis of public RNA-seq datasets to obtain novel insights into gene regulation by genomic variations and their contributions to complex traits. Many variations exist in the human genome, creating phenotypic diversity among individuals. It is well known that they are associated with the risk of disease development by affecting the expression levels of genes. Genes are transcribed from regulatory elements called promoters. Although some genes are transcribed from multiple promoters and translated into proteins with different functions, the relationship between genomic variations and promoter activities has not been investigated in depth compared to the relationship between genomic variations and gene expression levels. In this study, we proposed a new method to detect the association between genomic variations and promoter activities. Our method identified the associations between many variations and promoters using genomic and promoter activity data from blood cells of 438 individuals. This study allowed us to identify new functional associations between genomic variations and genes. Furthermore, we identified previously undiscovered variation-gene-disease associations. Our results will help to elucidate the molecular mechanisms of diseases in which genetic factors are involved.
Collapse
Affiliation(s)
- Naoto Kubota
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Mikita Suyama
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
- * E-mail:
| |
Collapse
|
9
|
Surachat K, Taylor TD, Wattanamatiphot W, Sukpisit S, Jeenkeawpiam K. aTAP: automated transcriptome analysis platform for processing RNA-seq data by de novo assembly. Heliyon 2022; 8:e10255. [PMID: 36033257 PMCID: PMC9404342 DOI: 10.1016/j.heliyon.2022.e10255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 04/27/2022] [Accepted: 08/05/2022] [Indexed: 11/05/2022] Open
Abstract
RNA-seq is a sequencing technique that uses next-generation sequencing (NGS) to explore and study the entire transcriptome of a biological sample. NGS-based analyses are mostly performed via command-line interfaces, which is an obstacle for molecular biologists and researchers. Therefore, the higher throughputs from NGS can only be accessed with the help of bioinformatics and computer science expertise. As the cost of sequencing is continuously falling, the use of RNA-seq seems certain to increase. To minimize the problems encountered by biologists and researchers in RNA-seq data analysis, we propose an automated platform with a web application that integrates various bioinformatics pipelines. The platform is intended to enable academic users to more easily analyze transcriptome datasets. Our automated Transcriptome Analysis Platform (aTAP) offers comprehensive bioinformatics workflows, including quality control of raw reads, trimming of low-quality reads, de novo transcriptome assembly, transcript expression quantification, differential expression analysis, and transcript annotation. aTAP has a user-friendly graphical interface, allowing researchers to interact with and visualize results in the web browser. This project offers an alternative way to analyze transcriptome data, by integrating efficient and well-known tools, that is simpler and more accessible to research communities. aTAP is freely available to academic users at https://atap.psu.ac.th/.
Collapse
Affiliation(s)
- Komwit Surachat
- Department of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand.,Translational Medicine Research Center, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand.,Molecular Evolution and Computational Biology Research Unit, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand
| | - Todd Duane Taylor
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Wanicbut Wattanamatiphot
- Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand
| | - Sukgamon Sukpisit
- Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand
| | - Kongpop Jeenkeawpiam
- Molecular Evolution and Computational Biology Research Unit, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla 90110, Thailand
| |
Collapse
|
10
|
Mishto M, Horokhovskyi Y, Cormican JA, Yang X, Lynham S, Urlaub H, Liepe J. Database search engines and target database features impinge upon the identification of post-translationally cis-spliced peptides in HLA class I immunopeptidomes. Proteomics 2022; 22:e2100226. [PMID: 35184383 PMCID: PMC9286349 DOI: 10.1002/pmic.202100226] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 01/19/2022] [Accepted: 02/10/2022] [Indexed: 11/08/2022]
Abstract
Unconventional epitopes presented by HLA class I complexes are emerging targets for T cell targeted immunotherapies. Their identification by mass spectrometry (MS) required development of novel methods to cope with the large number of theoretical candidates. Methods to identify post-translationally spliced peptides led to a broad range of outcomes. We here investigated the impact of three common database search engines - that is, Mascot, Mascot+Percolator, and PEAKS DB - as final identification step, as well as the features of target database on the ability to correctly identify non-spliced and cis-spliced peptides. We used ground truth datasets measured by MS to benchmark methods' performance and extended the analysis to HLA class I immunopeptidomes. PEAKS DB showed better precision and recall of cis-spliced peptides and larger number of identified peptides in HLA class I immunopeptidomes than the other search engine strategies. The better performance of PEAKS DB appears to result from better discrimination between target and decoy hits and hence a more robust FDR estimation, and seems independent to peptide and spectrum features here investigated.
Collapse
Affiliation(s)
- Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology (CIBCI) & Peter Gorer Department of ImmunobiologyKing's College LondonLondonUK
- Francis Crick InstituteLondonUK
| | | | - John A. Cormican
- Max‐Planck‐Institute for Multidisciplinary SciencesGöttingenGermany
| | - Xiaoping Yang
- Proteomics Core Facility, James Black CentreKing's CollegeLondonUK
| | - Steven Lynham
- Proteomics Core Facility, James Black CentreKing's CollegeLondonUK
| | - Henning Urlaub
- Max‐Planck‐Institute for Multidisciplinary SciencesGöttingenGermany
- Institute of Clinical ChemistryUniversity Medical Center GöttingenGöttingenGermany
| | - Juliane Liepe
- Max‐Planck‐Institute for Multidisciplinary SciencesGöttingenGermany
| |
Collapse
|
11
|
Wu F, Liu YZ, Ling B. MTD: a unique pipeline for host and meta-transcriptome joint and integrative analyses of RNA-seq data. Brief Bioinform 2022; 23:6563416. [PMID: 35380623 PMCID: PMC9116375 DOI: 10.1093/bib/bbac111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 02/22/2022] [Accepted: 03/06/2022] [Indexed: 11/13/2022] Open
Abstract
Ribonucleic acid (RNA)-seq data contain not only host transcriptomes but also nonhost information that comprises transcripts from active microbiota in the host cells. Therefore, joint and integrative analyses of both host and meta-transcriptome can reveal gene expression of the microbial community in a given sample as well as the correlative and interactive dynamics of the host response to the microbiome. However, there are no convenient tools that can systemically analyze host-microbiota interactions through simultaneously quantifying the host and meta-transcriptome in the same sample at the tissue and the single-cell level. This poses a challenge for interested researchers with limited expertise in bioinformatics. Here, we developed a software pipeline that can comprehensively and synergistically analyze and correlate the host and meta-transcriptome in a single sample using bulk and single-cell RNA-seq data. This pipeline, named meta-transcriptome detector (MTD), can extensively identify and quantify microbiome, including viruses, bacteria, protozoa, fungi, plasmids and vectors, in the host cells and correlate the microbiome with the host transcriptome. MTD is easy to install and run, involving only a few lines of simple commands. It offers researchers with unique genomics insights into host responses to microorganisms.
Collapse
Affiliation(s)
- Fei Wu
- Host-Pathogen Interaction Program, Texas Biomedical Research Institute, 8715 W Military Dr, San Antonio, TX 78227, USA.,Tulane Center for Aging, Tulane University School of Medicine, New Orleans, LA 70112, USA
| | - Yao-Zhong Liu
- Tulane University School of Public Health and Tropical Medicine, New Orleans, LA 70112, USA
| | - Binhua Ling
- Host-Pathogen Interaction Program, Texas Biomedical Research Institute, 8715 W Military Dr, San Antonio, TX 78227, USA
| |
Collapse
|
12
|
Brüning RS, Tombor L, Schulz MH, Dimmeler S, John D. Comparative analysis of common alignment tools for single-cell RNA sequencing. Gigascience 2022; 11:6515741. [PMID: 35084033 PMCID: PMC8848315 DOI: 10.1093/gigascience/giac001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 10/07/2021] [Accepted: 12/27/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND With the rise of single-cell RNA sequencing new bioinformatic tools have been developed to handle specific demands, such as quantifying unique molecular identifiers and correcting cell barcodes. Here, we benchmarked several datasets with the most common alignment tools for single-cell RNA sequencing data. We evaluated differences in the whitelisting, gene quantification, overall performance, and potential variations in clustering or detection of differentially expressed genes. We compared the tools Cell Ranger version 6, STARsolo, Kallisto, Alevin, and Alevin-fry on 3 published datasets for human and mouse, sequenced with different versions of the 10X sequencing protocol. RESULTS Striking differences were observed in the overall runtime of the mappers. Besides that, Kallisto and Alevin showed variances in the number of valid cells and detected genes per cell. Kallisto reported the highest number of cells; however, we observed an overrepresentation of cells with low gene content and unknown cell type. Conversely, Alevin rarely reported such low-content cells. Further variations were detected in the set of expressed genes. While STARsolo, Cell Ranger 6, Alevin-fry, and Alevin produced similar gene sets, Kallisto detected additional genes from the Vmn and Olfr gene family, which are likely mapping artefacts. We also observed differences in the mitochondrial content of the resulting cells when comparing a prefiltered annotation set to the full annotation set that includes pseudogenes and other biotypes. CONCLUSION Overall, this study provides a detailed comparison of common single-cell RNA sequencing mappers and shows their specific properties on 10X Genomics data.
Collapse
Affiliation(s)
- Ralf Schulze Brüning
- Institute of Cardiovascular Regeneration, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany.,Cardio-Pulmonary Institute (CPI), Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
| | - Lukas Tombor
- Institute of Cardiovascular Regeneration, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany.,German Center for Cardiovascular Research (DZHK), Potsdamer Str. 58 10785 Berlin, Germany
| | - Marcel H Schulz
- Institute of Cardiovascular Regeneration, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany.,Cardio-Pulmonary Institute (CPI), Theodor-Stern-Kai 7, 60590 Frankfurt, Germany.,German Center for Cardiovascular Research (DZHK), Potsdamer Str. 58 10785 Berlin, Germany
| | - Stefanie Dimmeler
- Institute of Cardiovascular Regeneration, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany.,Cardio-Pulmonary Institute (CPI), Theodor-Stern-Kai 7, 60590 Frankfurt, Germany.,German Center for Cardiovascular Research (DZHK), Potsdamer Str. 58 10785 Berlin, Germany
| | - David John
- Institute of Cardiovascular Regeneration, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany.,Cardio-Pulmonary Institute (CPI), Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
| |
Collapse
|
13
|
Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform 2022; 23:6514404. [PMID: 35076693 PMCID: PMC8921630 DOI: 10.1093/bib/bbab563] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 12/03/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
Collapse
Affiliation(s)
- Venket Raghavan
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | - Louis Kraft
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | | | | |
Collapse
|
14
|
Wang K, Patkar S, Lee JS, Gertz EM, Robinson W, Schischlik F, Crawford DR, Schaffer AA, Ruppin E. Deconvolving clinically relevant cellular immune crosstalk from bulk gene expression using CODEFACS and LIRICS stratifies melanoma patients to anti-PD-1 therapy. Cancer Discov 2022; 12:1088-1105. [PMID: 34983745 DOI: 10.1158/2159-8290.cd-21-0887] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Revised: 11/09/2021] [Accepted: 12/22/2021] [Indexed: 11/16/2022]
Abstract
The tumor microenvironment (TME) is a complex mixture of cell types whose interactions affect tumor growth and clinical outcome. To discover such interactions, we developed CODEFACS (COnfident DEconvolution For All Cell Subsets), a tool deconvolving cell-type-specific gene expression in each sample from bulk expression, and LIRICS (LIgand Receptor Interactions between Cell Subsets), a statistical framework prioritizing clinically relevant ligand-receptor interactions between cell types from the deconvolved data. We first demonstrate the superiority of CODEFACS versus the state-of-the-art deconvolution method, CIBERSORTx. Second, analyzing the TCGA, we uncover cell-type-specific ligand-receptor interactions uniquely associated with mismatch repair deficiency across different cancer types, providing additional insights into their enhanced sensitivity to anti-PD1 therapy compared to other tumors with high neoantigen burden. Finally, we identify a subset of cell-type-specific ligand-receptor interactions in the melanoma TME that stratify survival of patients receiving anti-PD1 therapy better than some recently published bulk transcriptomics-based methods.
Collapse
Affiliation(s)
- Kun Wang
- NCI, National Institutes of Health
| | - Sushant Patkar
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park
| | - Joo Sang Lee
- Cancer Data Science Lab, National Cancer Institute/National Institute of Health
| | - E Michael Gertz
- Cancer Data Science Laboratory, Center for Cancer Research, NCI/NIH
| | | | | | | | | | | |
Collapse
|
15
|
You Y, Tian L, Su S, Dong X, Jabbari JS, Hickey PF, Ritchie ME. Benchmarking UMI-based single-cell RNA-seq preprocessing workflows. Genome Biol 2021; 22:339. [PMID: 34906205 PMCID: PMC8672463 DOI: 10.1186/s13059-021-02552-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
Collapse
Affiliation(s)
- Yue You
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Luyi Tian
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Jafar S. Jabbari
- Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, The University of Melbourne at The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Peter F. Hickey
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- Single-Cell Open Research Endeavour (SCORE), The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia
| |
Collapse
|
16
|
Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis. Methods Protoc 2021; 4:mps4040068. [PMID: 34698224 PMCID: PMC8544431 DOI: 10.3390/mps4040068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 08/22/2021] [Accepted: 09/24/2021] [Indexed: 12/13/2022] Open
Abstract
RNA sequencing has become the standard technique for high resolution genome-wide monitoring of gene expression. As such, it often comprises the first step towards understanding complex molecular mechanisms driving various phenotypes, spanning organ development to disease genesis, monitoring and progression. An advantage of RNA sequencing is its ability to capture complex transcriptomic events such as alternative splicing which results in alternate isoform abundance. At the same time, this advantage remains algorithmically and computationally challenging, especially with the emergence of even higher resolution technologies such as single-cell RNA sequencing. Although several algorithms have been proposed for the effective detection of differential isoform expression from RNA-Seq data, no widely accepted golden standards have been established. This fact is further compounded by the significant differences in the output of different algorithms when applied on the same data. In addition, many of the proposed algorithms remain scarce and poorly maintained. Driven by these challenges, we developed a novel integrative approach that effectively combines the most widely used algorithms for differential transcript and isoform analysis using state-of-the-art machine learning techniques. We demonstrate its usability by applying it on simulated data based on several organisms, and using several performance metrics; we conclude that our strategy outperforms the application of the individual algorithms. Finally, our approach is implemented as an R Shiny application, with the underlying data analysis pipelines also available as docker containers.
Collapse
|
17
|
Shiga M, Seno S, Onizuka M, Matsuda H. SC-JNMF: single-cell clustering integrating multiple quantification methods based on joint non-negative matrix factorization. PeerJ 2021; 9:e12087. [PMID: 34532161 PMCID: PMC8404576 DOI: 10.7717/peerj.12087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 08/07/2021] [Indexed: 11/20/2022] Open
Abstract
Single-cell RNA-sequencing is a rapidly evolving technology that enables us to understand biological processes at unprecedented resolution. Single-cell expression analysis requires a complex data processing pipeline, and the pipeline is divided into two main parts: The quantification part, which converts the sequence information into gene-cell matrix data; the analysis part, which analyzes the matrix data using statistics and/or machine learning techniques. In the analysis part, unsupervised cell clustering plays an important role in identifying cell types and discovering cell diversity and subpopulations. Identified cell clusters are also used for subsequent analysis, such as finding differentially expressed genes and inferring cell trajectories. However, single-cell clustering using gene expression profiles shows different results depending on the quantification methods. Clustering results are greatly affected by the quantification method used in the upstream process. In other words, even if the original RNA-sequence data is the same, gene expression profiles processed by different quantification methods will produce different clusters. In this article, we propose a robust and highly accurate clustering method based on joint non-negative matrix factorization (joint-NMF) by utilizing the information from multiple gene expression profiles quantified using different methods from the same RNA-sequence data. Our joint-NMF can extract common factors among multiple gene expression profiles by applying each NMF under the constraint that one of the factorized matrices is shared among multiple NMFs. The joint-NMF determines more robust and accurate cell clustering results by leveraging multiple quantification methods compared to conventional clustering methods, which use only a single gene expression profile. Additionally, we showed the usefulness of discovering marker genes with the extracted features using our method.
Collapse
Affiliation(s)
- Mikio Shiga
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Shigeto Seno
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Makoto Onizuka
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Hideo Matsuda
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| |
Collapse
|
18
|
Fomina AF. Neglected wardens: T lymphocyte ryanodine receptors. J Physiol 2021; 599:4415-4426. [PMID: 34411300 DOI: 10.1113/jp281722] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 06/22/2021] [Indexed: 12/14/2022] Open
Abstract
Ryanodine receptors (RyRs) are intracellular Ca2+ release channels ubiquitously expressed in various cell types. RyRs were extensively studied in striated muscle cells due to their crucial role in muscle contraction. In contrast, the role of RyRs in Ca2+ signalling and functions in non-excitable cells, such as T lymphocytes, remains poorly understood. Expression of different isoforms of RyRs was shown in primary T cells and T cell lines. In T cells, RyRs co-localize with the plasmalemmal store-operated Ca2+ channels of the Orai family and endoplasmic reticulum Ca2+ sensing Stim family proteins and are activated by store-operated Ca2+ entry and pyridine nucleotide metabolites, the intracellular second messengers generated upon stimulation of T cell receptors. Experimental data indicate that together with d-myo-inositol 1,4,5-trisphosphate receptors, RyRs regulate intercellular Ca2+ dynamics by controlling Ca2+ concentration within the lumen of the endoplasmic reticulum and, consequently, store-operated Ca2+ entry. Gain-of-function mutations, genetic deletion or pharmacological inhibition of RyRs alters T cell Ca2+ signalling and effector functions. The picture emerging from the collective data shows that RyRs are the essential regulators of T cell Ca2+ signalling and can be potentially used as molecular targets for immunomodulation or T cell-based diagnostics of the disorders associated with RyRs dysregulation.
Collapse
Affiliation(s)
- Alla F Fomina
- Department of Physiology and Membrane Biology, University of California, Davis, CA, USA
| |
Collapse
|
19
|
Krappinger JC, Bonstingl L, Pansy K, Sallinger K, Wreglesworth NI, Grinninger L, Deutsch A, El-Heliebi A, Kroneis T, Mcfarlane RJ, Sensen CW, Feichtinger J. Non-coding Natural Antisense Transcripts: Analysis and Application. J Biotechnol 2021; 340:75-101. [PMID: 34371054 DOI: 10.1016/j.jbiotec.2021.08.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 06/30/2021] [Accepted: 08/04/2021] [Indexed: 12/12/2022]
Abstract
Non-coding natural antisense transcripts (ncNATs) are regulatory RNA sequences that are transcribed in the opposite direction to protein-coding or non-coding transcripts. These transcripts are implicated in a broad variety of biological and pathological processes, including tumorigenesis and oncogenic progression. With this complex field still in its infancy, annotations, expression profiling and functional characterisations of ncNATs are far less comprehensive than those for protein-coding genes, pointing out substantial gaps in the analysis and characterisation of these regulatory transcripts. In this review, we discuss ncNATs from an analysis perspective, in particular regarding the use of high-throughput sequencing strategies, such as RNA-sequencing, and summarize the unique challenges of investigating the antisense transcriptome. Finally, we elaborate on their potential as biomarkers and future targets for treatment, focusing on cancer.
Collapse
Affiliation(s)
- Julian C Krappinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Christian Doppler Laboratory for innovative Pichia pastoris host and vector systems, Division of Cell Biology, Histology and Embryology, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria
| | - Lilli Bonstingl
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Katrin Pansy
- Division of Haematology, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria
| | - Katja Sallinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Nick I Wreglesworth
- North West Cancer Research Institute, School of Medical Sciences, Bangor University, LL57 2UW Bangor, United Kingdom
| | - Lukas Grinninger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Austrian Biotech University of Applied Sciences, Konrad Lorenz-Straße 10, 3430 Tulln an der Donau, Austria
| | - Alexander Deutsch
- Division of Haematology, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria
| | - Amin El-Heliebi
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Thomas Kroneis
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Ramsay J Mcfarlane
- North West Cancer Research Institute, School of Medical Sciences, Bangor University, LL57 2UW Bangor, United Kingdom
| | - Christoph W Sensen
- BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria; Institute of Computational Biotechnology, Graz University of Technology, Petersgasse 14/V, 8010 Graz, Austria; HCEMM Kft., Római blvd. 21, 6723 Szeged, Hungary
| | - Julia Feichtinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Christian Doppler Laboratory for innovative Pichia pastoris host and vector systems, Division of Cell Biology, Histology and Embryology, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria.
| |
Collapse
|
20
|
Flores-León M, Alcaraz N, Pérez-Domínguez M, Torres-Arciga K, Rebollar-Vega R, De la Rosa-Velázquez IA, Arriaga-Canon C, Herrera LA, Arias C, González-Barrios R. Transcriptional Profiles Reveal Deregulation of Lipid Metabolism and Inflammatory Pathways in Neurons Exposed to Palmitic Acid. Mol Neurobiol 2021; 58:4639-4651. [PMID: 34155583 DOI: 10.1007/s12035-021-02434-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 05/18/2021] [Indexed: 12/13/2022]
Abstract
The effects of the consumption of high-fat diets (HFD) have been studied to unravel the molecular pathways they are altering in order to understand the link between increased caloric intake, metabolic diseases, and the risk of cognitive dysfunction. The saturated fatty acid, palmitic acid (PA), is the main component of HFD and it has been found increased in the circulation of obese and diabetic people. In the central nervous system, PA has been associated with inflammatory responses in astrocytes, but the effects on neurons exposed to it have not been largely investigated. Given that PA affects a variety of metabolic pathways, we aimed to analyze the transcriptomic profile activated by this fatty acid to shed light on the mechanisms of neuronal dysfunction. In the current study, we profiled the transcriptome response after PA exposition at non-toxic doses in primary hippocampal neurons. Gene ontology and Reactome pathway analysis revealed a pattern of gene expression which is associated with inflammatory pathways, and importantly, with the activation of lipid metabolism that is considered not very active in neurons. Validation by quantitative RT-PCR (qRT-PCR) of Hmgcs2, Angptl4, Ugt8, and Rnf145 support the results obtained by RNAseq. Overall, these findings suggest that neurons are able to respond to saturated fatty acids changing the expression pattern of genes associated with inflammatory response and lipid utilization that may be involved in the neuronal damage associated with metabolic diseases.
Collapse
Affiliation(s)
- M Flores-León
- Departamento de Medicina Genómica y Toxicología Ambiental, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - N Alcaraz
- The Bioinformatics Centre. Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200, Copenhagen N, Denmark
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Tlalpan, CP 14610, Mexico City, Mexico
| | - M Pérez-Domínguez
- Departamento de Medicina Genómica y Toxicología Ambiental, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - K Torres-Arciga
- Unidad de Investigación Biomédica en Cáncer, Instituto Nacional de Cancerología-Instituto de Investigaciones Biomédicas, UNAM, Avenida San Fernando No. 22, Colonia Sección XVI, Tlalpan, CP 14080, Mexico City, Mexico
| | - R Rebollar-Vega
- Genomics Laboratory, Red de Apoyo a La Investigación - CIC, Universidad Nacional Autónoma de México, INMCNSZ, Vasco de Quiroga 15, Belisario Domínguez Secc. 16, Tlalpan, 14080, Mexico City, Mexico
| | - I A De la Rosa-Velázquez
- Genomics Laboratory, Red de Apoyo a La Investigación - CIC, Universidad Nacional Autónoma de México, INMCNSZ, Vasco de Quiroga 15, Belisario Domínguez Secc. 16, Tlalpan, 14080, Mexico City, Mexico
- Next Generation Sequencing Core Facility, Helmholtz Zentrum Muenchen, Ingolstaedter Landstr 1, 85754, Neuherberg, Germany
| | - C Arriaga-Canon
- Unidad de Investigación Biomédica en Cáncer, Instituto Nacional de Cancerología-Instituto de Investigaciones Biomédicas, UNAM, Avenida San Fernando No. 22, Colonia Sección XVI, Tlalpan, CP 14080, Mexico City, Mexico
| | - L A Herrera
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Tlalpan, CP 14610, Mexico City, Mexico
- Unidad de Investigación Biomédica en Cáncer, Instituto Nacional de Cancerología-Instituto de Investigaciones Biomédicas, UNAM, Avenida San Fernando No. 22, Colonia Sección XVI, Tlalpan, CP 14080, Mexico City, Mexico
| | - Clorinda Arias
- Departamento de Medicina Genómica y Toxicología Ambiental, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Ciudad de México, México.
| | - Rodrigo González-Barrios
- Unidad de Investigación Biomédica en Cáncer, Instituto Nacional de Cancerología-Instituto de Investigaciones Biomédicas, UNAM, Avenida San Fernando No. 22, Colonia Sección XVI, Tlalpan, CP 14080, Mexico City, Mexico.
| |
Collapse
|
21
|
Transcriptome-Wide Identification and Quantification of Caffeoylquinic Acid Biosynthesis Pathway and Prediction of Its Putative BAHDs Gene Complex in A. spathulifolius. Int J Mol Sci 2021; 22:ijms22126333. [PMID: 34199260 PMCID: PMC8231772 DOI: 10.3390/ijms22126333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 06/08/2021] [Accepted: 06/11/2021] [Indexed: 11/17/2022] Open
Abstract
The phenylpropanoid pathway is a major secondary metabolite pathway that helps plants overcome biotic and abiotic stress and produces various byproducts that promote human health. Its byproduct caffeoylquinic acid is a soluble phenolic compound present in many angiosperms. Hydroxycinnamate-CoA shikimate/quinate transferase is a significant enzyme that plays a role in accumulating CQA biosynthesis. This study analyzed transcriptome-wide identification of the phenylpropanoid to caffeoylquinic acid biosynthesis candidate genes in A. spathulifolius flowers and leaves. Transcriptomic analyses of the flowers and leaves showed a differential expression of the PPP and CQA biosynthesis regulated unigenes. An analysis of PPP-captive unigenes revealed a major duplication in the following genes: PAL, 120 unigenes in leaves and 76 in flowers; C3′H, 169 unigenes in leaves and 140 in flowers; 4CL, 41 unigenes in leaves and 27 in flowers; and C4H, 12 unigenes in leaves and 4 in flowers. The phylogenetic analysis revealed 82 BAHDs superfamily members in leaves and 72 in flowers, among which five unigenes encode for HQT and three for HCT. The three HQT are common to both leaves and flowers, whereas the two HQT were specialized for leaves. The pattern of HQT synthesis was upregulated in flowers, whereas HCT was expressed strongly in the leaves of A. spathulifolius. Overall, 4CL, C4H, and HQT are expressed strongly in flowers and CAA and HCT show more expression in leaves. As a result, the quantification of HQT and HCT indicates that CQA biosynthesis is more abundant in the flowers and synthesis of caffeic acid in the leaves of A. spathulifolius.
Collapse
|
22
|
Overbey EG, Saravia-Butler AM, Zhang Z, Rathi KS, Fogle H, da Silveira WA, Barker RJ, Bass JJ, Beheshti A, Berrios DC, Blaber EA, Cekanaviciute E, Costa HA, Davin LB, Fisch KM, Gebre SG, Geniza M, Gilbert R, Gilroy S, Hardiman G, Herranz R, Kidane YH, Kruse CPS, Lee MD, Liefeld T, Lewis NG, McDonald JT, Meller R, Mishra T, Perera IY, Ray S, Reinsch SS, Rosenthal SB, Strong M, Szewczyk NJ, Tahimic CGT, Taylor DM, Vandenbrink JP, Villacampa A, Weging S, Wolverton C, Wyatt SE, Zea L, Costes SV, Galazka JM. NASA GeneLab RNA-seq consensus pipeline: standardized processing of short-read RNA-seq data. iScience 2021; 24:102361. [PMID: 33870146 PMCID: PMC8044432 DOI: 10.1016/j.isci.2021.102361] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/30/2020] [Accepted: 03/23/2021] [Indexed: 12/15/2022] Open
Abstract
With the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated analysis working groups (AWGs) have developed a consensus pipeline for analyzing short-read RNA-sequencing data from spaceflight-associated experiments. The pipeline includes quality control, read trimming, mapping, and gene quantification steps, culminating in the detection of differentially expressed genes. This data analysis pipeline and the results of its execution using data submitted to GeneLab are now all publicly available through the GeneLab database. We present here the full details and rationale for the construction of this pipeline in order to promote transparency, reproducibility, and reusability of pipeline data; to provide a template for data processing of future spaceflight-relevant datasets; and to encourage cross-analysis of data from other databases with the data available in GeneLab. Analysis of omics data from different spaceflight studies presents unique challenges A standardized pipeline for RNA-seq analysis eliminates data processing variation The GeneLab RNA-seq pipeline includes QC, trimming, mapping, quantification, and DGE Space-relevant data processed with this pipeline are available at genelab.nasa.gov
Collapse
Affiliation(s)
- Eliah G Overbey
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Amanda M Saravia-Butler
- Logyx, LLC, Mountain View, CA 94043, USA.,Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Zhe Zhang
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Komal S Rathi
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Homer Fogle
- The Bionetics Corporation, NASA Ames Research Center, Moffett Field, CA 94035, USA.,Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Willian A da Silveira
- Institute for Global Food Security (IGFS) & School of Biological Sciences, Queen's University Belfast, Belfast, UK
| | - Richard J Barker
- Department of Botany, University of Wisconsin, Madison, WI 53706, USA
| | - Joseph J Bass
- MRC Versus Arthritis Centre for Musculoskeletal Ageing Research, Royal Derby Hospital, University of Nottingham & National Institute for Health Research Nottingham Biomedical Research Centre, Derby DE22 3DT, UK
| | - Afshin Beheshti
- KBR, NASA Ames Research Center, Moffett Field, CA 94035, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Daniel C Berrios
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Elizabeth A Blaber
- Center for Biotechnology and Interdisciplinary Studies, Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Egle Cekanaviciute
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Helio A Costa
- Departments of Pathology, and of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence B Davin
- Institute of Biological Chemistry, Washington State University, Pullman, WA 99164, USA
| | - Kathleen M Fisch
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Samrawit G Gebre
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA.,KBR, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | | | - Rachel Gilbert
- NASA Postdoctoral Program, Universities Space Research Association, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Simon Gilroy
- Department of Botany, University of Wisconsin, Madison, WI 53706, USA
| | - Gary Hardiman
- Institute for Global Food Security (IGFS) & School of Biological Sciences, Queen's University Belfast, Belfast, UK.,Medical University of South Carolina, Charleston, SC, USA
| | - Raúl Herranz
- Centro de Investigaciones Biológicas Margarita Salas (CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Yared H Kidane
- Center for Pediatric Bone Biology and Translational Research, Texas Scottish Rite Hospital for Children, 2222 Welborn St., Dallas, TX 75219, USA
| | - Colin P S Kruse
- Los Alamos National Laboratory, Bioscience Division, Los Alamos, NM 87545, USA
| | - Michael D Lee
- Exobiology Branch, NASA Ames Research Center, Mountain View, CA 94035, USA.,Blue Marble Space Institute of Science, Seattle, WA 98154, USA
| | - Ted Liefeld
- Department of Medicine, University of California San Diego, San Diego, CA 92093, USA
| | - Norman G Lewis
- Institute of Biological Chemistry, Washington State University, Pullman, WA 99164, USA
| | - J Tyson McDonald
- Department of Radiation Medicine, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Robert Meller
- Department of Neurobiology and Pharmacology, Morehouse School of Medicine, Atlanta, GA 30310, USA
| | - Tejaswini Mishra
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Imara Y Perera
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| | - Shayoni Ray
- NGM Biopharmaceuticals, South San Francisco, CA 94080, USA
| | - Sigrid S Reinsch
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Sara Brin Rosenthal
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Michael Strong
- National Jewish Health, Center for Genes, Environment, and Health, 1400 Jackson Street, Denver, CO 80206, USA
| | - Nathaniel J Szewczyk
- Ohio Musculoskeletal and Neurological Institute and Department of Biomedical Sciences, Ohio University, Athens, OH 43147, USA
| | - Candice G T Tahimic
- Department of Biology, University of North Florida, Jacksonville, FL 32224, USA
| | - Deanne M Taylor
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia and the Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Alicia Villacampa
- Centro de Investigaciones Biológicas Margarita Salas (CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Silvio Weging
- Institute of Computer Science, Martin-Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, Halle 06120, Germany
| | - Chris Wolverton
- Department of Botany and Microbiology, Ohio Wesleyan University, Delaware, OH, USA
| | - Sarah E Wyatt
- Department of Environmental and Plant Biology, Ohio University, Athens, OH 45701, USA.,Interdisciplinary Program in Molecular and Cellular Biology, Ohio University, Athens, OH 45701, USA
| | - Luis Zea
- BioServe Space Technologies, Aerospace Engineering Sciences Department, University of Colorado Boulder, Boulder 80303 USA
| | - Sylvain V Costes
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Jonathan M Galazka
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| |
Collapse
|
23
|
Geles K, Palumbo D, Sellitto A, Giurato G, Cianflone E, Marino F, Torella D, Mirici Cappa V, Nassa G, Tarallo R, Weisz A, Rizzo F. WIND (Workflow for pIRNAs aNd beyonD): a strategy for in-depth analysis of small RNA-seq data. F1000Res 2021; 10:1. [PMID: 34316353 PMCID: PMC8276195 DOI: 10.12688/f1000research.27868.3] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/02/2021] [Indexed: 12/15/2022] Open
Abstract
Current bioinformatics workflows for PIWI-interacting RNA (piRNA) analysis focus primarily on germline-derived piRNAs and piRNA-clusters. Frequently, they suffer from outdated piRNA databases, questionable quantification methods, and lack of reproducibility. Often, pipelines specific to miRNA analysis are used for the piRNA research
in silico. Furthermore, the absence of a well-established database for piRNA annotation, as for miRNA, leads to uniformity issues between studies and generates confusion for data analysts and biologists. For these reasons, we have developed WIND (
Workflow for p
IRNAs a
Nd beyon
D), a bioinformatics workflow that addresses the crucial issue of piRNA annotation, thereby allowing a reliable analysis of small RNA sequencing data for the identification of piRNAs and other small non-coding RNAs (sncRNAs) that in the past have been incorrectly classified as piRNAs. WIND allows the creation of a comprehensive annotation track of sncRNAs combining information available in RNAcentral, with piRNA sequences from piRNABank, the first database dedicated to piRNA annotation. WIND was built with Docker containers for reproducibility and integrates widely used bioinformatics tools for sequence alignment and quantification. In addition, it includes Bioconductor packages for exploratory data and differential expression analysis. Moreover, WIND implements a "dual" approach for the evaluation of sncRNAs expression level quantifying the aligned reads to the annotated genome and carrying out an alignment-free transcript quantification using reads mapped to the transcriptome. Therefore, a broader range of piRNAs can be annotated, improving their quantification and easing the subsequent downstream analysis. WIND performance has been tested with several small RNA-seq datasets, demonstrating how our approach can be a useful and comprehensive resource to analyse piRNAs and other classes of sncRNAs.
Collapse
Affiliation(s)
- Konstantinos Geles
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy
| | - Domenico Palumbo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Clinical Research and Innovation, Clinica Montevergine S.p.A., Mercogliano, Mercogliano, 83013, Italy
| | - Assunta Sellitto
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy
| | - Giorgio Giurato
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| | - Eleonora Cianflone
- Department of Medical and Surgical Sciences, Magna Graecia University, Viale Europa, Catanzaro, 88100, Italy
| | - Fabiola Marino
- Department of Experimental and Clinical Medicine, Molecular and Cellular Cardiology, Magna Graecia University, Viale Europa, Catanzaro, 88100, Italy
| | - Daniele Torella
- Department of Experimental and Clinical Medicine, Molecular and Cellular Cardiology, Magna Graecia University, Viale Europa, Catanzaro, 88100, Italy
| | - Valeria Mirici Cappa
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy
| | - Giovanni Nassa
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| | - Roberta Tarallo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| | - Alessandro Weisz
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| | - Francesca Rizzo
- Laboratory of Molecular Medicine and Genomics, Department of Medicine, Surgery and Dentistry 'Scuola Medica Salernitana', University of Salerno, Baronissi, Salerno (SA), 84081, Italy.,Genomix4Life, via S. Allende 43/L, Baronissi, Salerno (SA), 84081, Italy.,CRGS (Genome Research Center for Health), University of Salerno Campus of Medicine, Baronissi, Salerno (SA), 84081, Italy
| |
Collapse
|
24
|
Uyar B, Palmer D, Kowald A, Murua Escobar H, Barrantes I, Möller S, Akalin A, Fuellen G. Single-cell analyses of aging, inflammation and senescence. Ageing Res Rev 2020; 64:101156. [PMID: 32949770 PMCID: PMC7493798 DOI: 10.1016/j.arr.2020.101156] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 08/23/2020] [Accepted: 08/25/2020] [Indexed: 02/09/2023]
Abstract
Single-cell gene expression (transcriptomics) data are becoming robust and abundant, and are increasingly used to track organisms along their life-course. This allows investigation into how aging affects cellular transcriptomes, and how changes in transcriptomes may underlie aging, including chronic inflammation (inflammaging), immunosenescence and cellular senescence. We compiled and tabulated aging-related single-cell datasets published to date, collected and discussed relevant findings, and inspected some of these datasets ourselves. We specifically note insights that cannot (or not easily) be based on bulk data. For example, in some datasets, the fraction of cells expressing p16 (CDKN2A), one of the most prominent markers of cellular senescence, was reported to increase, in addition to its upregulated mean expression over all cells. Moreover, we found evidence for inflammatory processes in most datasets, some of these driven by specific cells of the immune system. Further, single-cell data are specifically useful to investigate whether transcriptional heterogeneity (also called noise or variability) increases with age, and many (but not all) studies in our review report an increase in such heterogeneity. Finally, we demonstrate some stability of marker gene expression patterns across closely similar studies and suggest that single-cell experiments may hold the key to provide detailed insights whenever interventions (countering aging, inflammation, senescence, disease, etc.) are affecting cells depending on cell type.
Collapse
Affiliation(s)
- Bora Uyar
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Daniel Palmer
- Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Aging Research, Rostock, Germany
| | - Axel Kowald
- Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Aging Research, Rostock, Germany
| | - Hugo Murua Escobar
- Rostock University Medical Center, Department of Hematology, Oncology and Palliative Medicine, Department of Medicine III, Rostock, Germany
| | - Israel Barrantes
- Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Aging Research, Rostock, Germany
| | - Steffen Möller
- Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Aging Research, Rostock, Germany
| | - Altuna Akalin
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Georg Fuellen
- Rostock University Medical Center, Institute for Biostatistics and Informatics in Medicine and Aging Research, Rostock, Germany.
| |
Collapse
|
25
|
Puccio S, Grillo G, Consiglio A, Soluri MF, Sblattero D, Cotella D, Santoro C, Liuni S, Bellis GD, Lugli E, Peano C, Licciulli F. InteractomeSeq: a web server for the identification and profiling of domains and epitopes from phage display and next generation sequencing data. Nucleic Acids Res 2020; 48:W200-W207. [PMID: 32402076 PMCID: PMC7319578 DOI: 10.1093/nar/gkaa363] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 04/16/2020] [Accepted: 05/05/2020] [Indexed: 01/03/2023] Open
Abstract
High-Throughput Sequencing technologies are transforming many research fields, including the analysis of phage display libraries. The phage display technology coupled with deep sequencing was introduced more than a decade ago and holds the potential to circumvent the traditional laborious picking and testing of individual phage rescued clones. However, from a bioinformatics point of view, the analysis of this kind of data was always performed by adapting tools designed for other purposes, thus not considering the noise background typical of the 'interactome sequencing' approach and the heterogeneity of the data. InteractomeSeq is a web server allowing data analysis of protein domains ('domainome') or epitopes ('epitome') from either Eukaryotic or Prokaryotic genomic phage libraries generated and selected by following an Interactome sequencing approach. InteractomeSeq allows users to upload raw sequencing data and to obtain an accurate characterization of domainome/epitome profiles after setting the parameters required to tune the analysis. The release of this tool is relevant for the scientific and clinical community, because InteractomeSeq will fill an existing gap in the field of large-scale biomarkers profiling, reverse vaccinology, and structural/functional studies, thus contributing essential information for gene annotation or antigen identification. InteractomeSeq is freely available at https://InteractomeSeq.ba.itb.cnr.it/.
Collapse
Affiliation(s)
- Simone Puccio
- Laboratory of Translational Immunology, Humanitas Clinical and Research Center, IRCCS, Rozzano (Milan), 20089, Italy
| | - Giorgio Grillo
- Institute for Biomedical Technologies, National Research Council, Bari 70100, Italy
| | - Arianna Consiglio
- Institute for Biomedical Technologies, National Research Council, Bari 70100, Italy
| | - Maria Felicia Soluri
- Department of Health Sciences & Center for TranslationalResearch on Autoimmune and Allergic Disease (CAAD), Università del Piemonte Orientale, Novara 28100, Italy
| | - Daniele Sblattero
- Department of Life Sciences, University of Trieste, Trieste 34100, Italy
| | - Diego Cotella
- Department of Health Sciences & Center for TranslationalResearch on Autoimmune and Allergic Disease (CAAD), Università del Piemonte Orientale, Novara 28100, Italy
| | - Claudio Santoro
- Department of Health Sciences & Center for TranslationalResearch on Autoimmune and Allergic Disease (CAAD), Università del Piemonte Orientale, Novara 28100, Italy
| | - Sabino Liuni
- Institute for Biomedical Technologies, National Research Council, Bari 70100, Italy
| | - Gianluca De Bellis
- Institute for Biomedical Technologies, National Research Council, Segrate (Milan) 20090, Italy
| | - Enrico Lugli
- Laboratory of Translational Immunology, Humanitas Clinical and Research Center, IRCCS, Rozzano (Milan), 20089, Italy.,Humanitas Flow Cytometry Core, Humanitas Clinical and Research Center, IRCCS, Rozzano (Milan) 20089, Italy
| | - Clelia Peano
- Institute of Genetic and Biomedical Research, UoS Milan, National Research Council, Rozzano (Milan) 20089, Italy.,Genomic Unit, Humanitas Clinical and Research Center, IRCCS,Rozzano (Milan) 20089, Italy
| | - Flavio Licciulli
- Institute for Biomedical Technologies, National Research Council, Bari 70100, Italy
| |
Collapse
|
26
|
Srivastava A, Malik L, Sarkar H, Zakeri M, Almodaresi F, Soneson C, Love MI, Kingsford C, Patro R. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol 2020; 21:239. [PMID: 32894187 PMCID: PMC7487471 DOI: 10.1186/s13059-020-02151-8] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 08/19/2020] [Indexed: 01/23/2023] Open
Abstract
Background The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy. Results We investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment. Conclusion We observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly, and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.
Collapse
Affiliation(s)
- Avi Srivastava
- Department of Computer Science, Stony Brook University, Stony Brook, USA
| | - Laraib Malik
- Department of Computer Science, Stony Brook University, Stony Brook, USA
| | - Hirak Sarkar
- Department of Computer Science, University of Maryland, College Park, USA
| | - Mohsen Zakeri
- Department of Computer Science, University of Maryland, College Park, USA
| | - Fatemeh Almodaresi
- Department of Computer Science, University of Maryland, College Park, USA
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Michael I Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA.,Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, USA.
| |
Collapse
|
27
|
Yao J, Wu DC, Nottingham RM, Lambowitz AM. Identification of protein-protected mRNA fragments and structured excised intron RNAs in human plasma by TGIRT-seq peak calling. eLife 2020; 9:e60743. [PMID: 32876046 PMCID: PMC7518892 DOI: 10.7554/elife.60743] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 09/01/2020] [Indexed: 12/18/2022] Open
Abstract
Human plasma contains > 40,000 different coding and non-coding RNAs that are potential biomarkers for human diseases. Here, we used thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) combined with peak calling to simultaneously profile all RNA biotypes in apheresis-prepared human plasma pooled from healthy individuals. Extending previous TGIRT-seq analysis, we found that human plasma contains largely fragmented mRNAs from > 19,000 protein-coding genes, abundant full-length, mature tRNAs and other structured small non-coding RNAs, and less abundant tRNA fragments and mature and pre-miRNAs. Many of the mRNA fragments identified by peak calling correspond to annotated protein-binding sites and/or have stable predicted secondary structures that could afford protection from plasma nucleases. Peak calling also identified novel repeat RNAs, miRNA-sized RNAs, and putatively structured intron RNAs of potential biological, evolutionary, and biomarker significance, including a family of full-length excised intron RNAs, subsets of which correspond to mirtron pre-miRNAs or agotrons.
Collapse
Affiliation(s)
- Jun Yao
- Institute for Cellular and Molecular Biology and Departments of Molecular Biosciences and Oncology, University of TexasAustinUnited States
| | - Douglas C Wu
- Institute for Cellular and Molecular Biology and Departments of Molecular Biosciences and Oncology, University of TexasAustinUnited States
| | - Ryan M Nottingham
- Institute for Cellular and Molecular Biology and Departments of Molecular Biosciences and Oncology, University of TexasAustinUnited States
| | - Alan M Lambowitz
- Institute for Cellular and Molecular Biology and Departments of Molecular Biosciences and Oncology, University of TexasAustinUnited States
| |
Collapse
|
28
|
Kahraman A, Karakulak T, Szklarczyk D, von Mering C. Pathogenic impact of transcript isoform switching in 1,209 cancer samples covering 27 cancer types using an isoform-specific interaction network. Sci Rep 2020; 10:14453. [PMID: 32879328 PMCID: PMC7468103 DOI: 10.1038/s41598-020-71221-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 07/17/2020] [Indexed: 01/01/2023] Open
Abstract
Under normal conditions, cells of almost all tissue types express the same predominant canonical transcript isoform at each gene locus. In cancer, however, splicing regulation is often disturbed, leading to cancer-specific switches in the most dominant transcripts (MDT). To address the pathogenic impact of these switches, we have analyzed isoform-specific protein-protein interaction disruptions in 1,209 cancer samples covering 27 different cancer types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project of the International Cancer Genomics Consortium (ICGC). Our study revealed large variations in the number of cancer-specific MDT (cMDT) with the highest frequency in cancers of female reproductive organs. Interestingly, in contrast to the mutational load, cancers arising from the same primary tissue had a similar number of cMDT. Some cMDT were found in 100% of all samples in a cancer type, making them candidates for diagnostic biomarkers. cMDT tend to be located at densely populated network regions where they disrupted protein interactions in the proximity of pathogenic cancer genes. A gene ontology enrichment analysis showed that these disruptions occurred mostly in protein translation and RNA splicing pathways. Interestingly, samples with mutations in the spliceosomal complex tend to have higher number of cMDT, while other transcript expressions correlated with mutations in non-coding splice-site and promoter regions of their genes. This work demonstrates for the first time the large extent of cancer-specific alterations in alternative splicing for 27 different cancer types. It highlights distinct and common patterns of cMDT and suggests novel pathogenic transcripts and markers that induce large network disruptions in cancers.
Collapse
Affiliation(s)
- Abdullah Kahraman
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tülay Karakulak
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
29
|
Olney KC, Brotman SM, Andrews JP, Valverde-Vesling VA, Wilson MA. Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data. Biol Sex Differ 2020; 11:42. [PMID: 32693839 PMCID: PMC7374973 DOI: 10.1186/s13293-020-00312-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 06/17/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Human X and Y chromosomes share an evolutionary origin and, as a consequence, sequence similarity. We investigated whether the sequence homology between the X and Y chromosomes affects the alignment of RNA-Seq reads and estimates of differential expression. We tested the effects of using reference genomes and reference transcriptomes informed by the sex chromosome complement of the sample's genome on the measurements of RNA-Seq abundance and sex differences in expression. RESULTS The default genome includes the entire human reference genome (GRCh38), including the entire sequence of the X and Y chromosomes. We created two sex chromosome complement informed reference genomes. One sex chromosome complement informed reference genome was used for samples that lacked a Y chromosome; for this reference genome version, we hard-masked the entire Y chromosome. For the other sex chromosome complement informed reference genome, to be used for samples with a Y chromosome, we hard-masked only the pseudoautosomal regions of the Y chromosome, because these regions are duplicated identically in the reference genome on the X chromosome. We analyzed the transcript abundance in the whole blood, brain cortex, breast, liver, and thyroid tissues from 20 genetic female (46, XX) and 20 genetic male (46, XY) samples. Each sample was aligned twice: once to the default reference genome and then independently aligned to a reference genome informed by the sex chromosome complement of the sample, repeated using two different read aligners, HISAT and STAR. We then quantified sex differences in gene expression using featureCounts to get the raw count estimates followed by Limma/Voom for normalization and differential expression. We additionally created sex chromosome complement informed transcriptome references for use in pseudo-alignment using Salmon. Transcript abundance was quantified twice for each sample: once to the default target transcripts and then independently to target transcripts informed by the sex chromosome complement of the sample. CONCLUSIONS We show that regardless of the choice of the read aligner, using an alignment protocol informed by the sex chromosome complement of the sample results in higher expression estimates on the pseudoautosomal regions of the X chromosome in both genetic male and genetic female samples, as well as an increased number of unique genes being called as differentially expressed between the sexes. We additionally show that using a pseudo-alignment approach informed on the sex chromosome complement of the sample eliminates Y-linked expression in female XX samples.
Collapse
Affiliation(s)
- Kimberly C Olney
- School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA.,Center for Evolution and Medicine, Arizona State University, Tempe, AZ, 85282, USA
| | - Sarah M Brotman
- School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Jocelyn P Andrews
- School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA.,College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA, 91766, USA
| | | | - Melissa A Wilson
- School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA. .,Center for Evolution and Medicine, Arizona State University, Tempe, AZ, 85282, USA. .,Center for Mechanisms of Evolution, The Biodesign Institute, Arizona State University, Tempe, AZ, 85282, USA.
| |
Collapse
|
30
|
A universal and independent synthetic DNA ladder for the quantitative measurement of genomic features. Nat Commun 2020; 11:3609. [PMID: 32681090 PMCID: PMC7367866 DOI: 10.1038/s41467-020-17445-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 06/22/2020] [Indexed: 11/08/2022] Open
Abstract
Standard units of measurement are required for the quantitative description of nature; however, few standard units have been established for genomics to date. Here, we have developed a synthetic DNA ladder that defines a quantitative standard unit that can measure DNA sequence abundance within a next-generation sequencing library. The ladder can be spiked into a DNA sample, and act as an internal scale that measures quantitative genetics features. Unlike previous spike-ins, the ladder is encoded within a single molecule, and can be equivalently and independently synthesized by different laboratories. We show how the ladder can measure diverse quantitative features, including human genetic variation and microbial abundance, and also estimate uncertainty due to technical variation and improve normalization between libraries. This ladder provides an independent quantitative unit that can be used with any organism, application or technology, thereby providing a common metric by which genomes can be measured.
Collapse
|
31
|
Hubbard A, Bomhoff M, Schmidt CJ. fRNAkenseq: a fully powered-by-CyVerse cloud integrated RNA-sequencing analysis tool. PeerJ 2020; 8:e8592. [PMID: 32461821 PMCID: PMC7231498 DOI: 10.7717/peerj.8592] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 01/18/2020] [Indexed: 11/20/2022] Open
Abstract
Background Decreasing costs make RNA sequencing technologies increasingly affordable for biologists. However, many researchers who can now afford sequencing lack access to resources necessary for downstream analysis. This means that even as algorithms to process RNA-Seq data improve, many biologists still struggle to manage the sheer volume of data produced by next generation sequencing (NGS) technologies. Scalable bioinformatics tools that exploit multiple platforms are needed to democratize bioinformatics resources in the sequencing era. This is essential for equipping many research groups in the life sciences with the tools to process the increasingly unwieldy datasets they produce. Methods One strategy to address this challenge is to develop a modern generation of sequence analysis tools capable of seamless data sharing and communication. Such tools will provide interoperability through offerings of interlinked resources. Systems of interlinked, scalable resources, which often incorporate cloud data storage, are broadly referred to as cyberinfrastructure. Cyberinfrastructure integrated tools will help researchers to robustly analyze large scale datasets by efficiently sharing data burdens across a distributed architecture. Additionally, interoperability will allow emerging tools to cross-adapt features of existing tools. It is important that these tools are designed to be easy to use for biologists. Results We introduce fRNAkenseq, a powered-by-CyVerse RNA sequencing analysis tool that exhibits interoperability with other resources and meets the needs of biologists for comprehensive, easy to use RNA sequencing analysis. fRNAkenseq leverages a complex set of Application Programming Interfaces (APIs) associated with the NSF-funded cyberinfrastructure project, CyVerse, to execute FASTQ-to-differential expression RNA-Seq analyses. Integrating across bioinformatics platforms, fRNAkenseq also exploits cloud integration and cross-talk with another CyVerse associated tool, CoGe. fRNAkenseq offers novel features for the biologist such as more robust and comprehensive pipelines for enrichment than those currently available by default in a single tool, whether they are cloud-based or local installation. Importantly, cross-talk with CoGe allows fRNAkenseq users to execute RNA-Seq pipelines on an inventory of 47,000 archived genomes stored in CoGe or upload their own draft genome.
Collapse
Affiliation(s)
- Allen Hubbard
- Donald Danforth Plant Science Center, Saint Louis, MO, USA
| | - Matthew Bomhoff
- Department of Plant and Soil Sciences, University of Arizona, Tucson, AZ, USA
| | - Carl J Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, DE, USA
| |
Collapse
|
32
|
Lachmann A, Clarke DJB, Torre D, Xie Z, Ma'ayan A. Interoperable RNA-Seq analysis in the cloud. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2020; 1863:194521. [PMID: 32156561 DOI: 10.1016/j.bbagrm.2020.194521] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 03/01/2020] [Accepted: 03/01/2020] [Indexed: 12/25/2022]
Abstract
RNA-Sequencing (RNA-Seq) is currently the leading technology for genome-wide transcript quantification. Mapping the raw reads to transcript and gene level counts can be achieved by different aligners. Here we report an in-depth comparison of transcript quantification methods. Our goal is the specific use of cost-efficient RNA-Seq analysis for deployment in a cloud infrastructure composed of interacting microservices. The individual modules cover file transfer into the cloud and APIs to handle the cloud alignment jobs. We next demonstrate how newly generated RNA-Seq data can be placed in the context of thousands of previously published datasets in near real time. With in-depth benchmarks, we identify suitable gene count quantification methods to facilitate cost-effective, accurate, and cloud-based RNA-Seq analysis service. Pseudo-alignment algorithms such as kallisto and Salmon combine high read quality estimation with cost efficient runtime performance. HISAT2 is the fastest of the classical aligners with good alignment quality. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Alexander Lachmann
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA.
| | - Daniel J B Clarke
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| | - Denis Torre
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| |
Collapse
|
33
|
Natsidis P, Schiffer PH, Salvador-Martínez I, Telford MJ. Computational discovery of hidden breaks in 28S ribosomal RNAs across eukaryotes and consequences for RNA Integrity Numbers. Sci Rep 2019; 9:19477. [PMID: 31863008 PMCID: PMC6925239 DOI: 10.1038/s41598-019-55573-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 11/27/2019] [Indexed: 11/09/2022] Open
Abstract
In some eukaryotes, a 'hidden break' has been described in which the 28S ribosomal RNA molecule is cleaved into two subparts. The break is common in protostome animals (arthropods, molluscs, annelids etc.), but a break has also been reported in some vertebrates and non-metazoan eukaryotes. We present a new computational approach to determine the presence of the hidden break in 28S rRNAs using mapping of RNA-Seq data. We find a homologous break is present across protostomes although it has been lost in a small number of taxa. We show that rare breaks in vertebrate 28S rRNAs are not homologous to the protostome break. A break is found in just 4 out of 331 species of non-animal eukaryotes studied and, in three of these, the break is located in the same position as the protostome break suggesting a striking instance of convergent evolution. RNA Integrity Numbers (RIN) rely on intact 28S rRNA and will be consistently underestimated in the great majority of animal species with a break.
Collapse
Affiliation(s)
- Paschalis Natsidis
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK
| | - Philipp H Schiffer
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK
| | - Irepan Salvador-Martínez
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
34
|
Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 2019; 20:278. [PMID: 31842956 PMCID: PMC6912988 DOI: 10.1186/s13059-019-1910-1] [Citation(s) in RCA: 685] [Impact Index Per Article: 137.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/02/2019] [Indexed: 11/13/2022] Open
Abstract
RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Aleksey V. Zimin
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Geo M. Pertea
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Roham Razaghi
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Mihaela Pertea
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
35
|
Auslander N, Lee JS, Ruppin E. Reply to: 'IMPRES does not reproducibly predict response to immune checkpoint blockade therapy in metastatic melanoma'. Nat Med 2019; 25:1836-1838. [PMID: 31806908 DOI: 10.1038/s41591-019-0646-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 10/08/2019] [Indexed: 11/09/2022]
Affiliation(s)
- Noam Auslander
- National Center for Biotechnology Information, National Library of Medicine, US National Institutes of Health, Bethesda, MD, USA
| | - Joo Sang Lee
- Cancer Data Science Lab (CDSL), National Cancer Institute, US National Institutes of Health, Bethesda, MD, USA.,Samsung Medical Center, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | - Eytan Ruppin
- Cancer Data Science Lab (CDSL), National Cancer Institute, US National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
36
|
Pizzinga M, Harvey RF, Garland GD, Mordue R, Dezi V, Ramakrishna M, Sfakianos A, Monti M, Mulroney TE, Poyry T, Willis AE. The cell stress response: extreme times call for post‐transcriptional measures. WILEY INTERDISCIPLINARY REVIEWS-RNA 2019; 11:e1578. [DOI: 10.1002/wrna.1578] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/09/2019] [Accepted: 10/16/2019] [Indexed: 12/26/2022]
Affiliation(s)
| | | | | | - Ryan Mordue
- MRC Toxicology Unit University of Cambridge Leicester UK
| | - Veronica Dezi
- MRC Toxicology Unit University of Cambridge Leicester UK
| | | | | | - Mie Monti
- MRC Toxicology Unit University of Cambridge Leicester UK
| | | | - Tuija Poyry
- MRC Toxicology Unit University of Cambridge Leicester UK
| | - Anne E. Willis
- MRC Toxicology Unit University of Cambridge Leicester UK
| |
Collapse
|
37
|
Vieth B, Parekh S, Ziegenhain C, Enard W, Hellmann I. A systematic evaluation of single cell RNA-seq analysis pipelines. Nat Commun 2019; 10:4667. [PMID: 31604912 PMCID: PMC6789098 DOI: 10.1038/s41467-019-12266-7] [Citation(s) in RCA: 149] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 08/28/2019] [Indexed: 01/27/2023] Open
Abstract
The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not yet been established. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ~3000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.
Collapse
Affiliation(s)
- Beate Vieth
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Munich, Germany
| | - Swati Parekh
- Max Planck Institute for Biology of Ageing, Cologne, Germany
| | - Christoph Ziegenhain
- Department of Cell and Molecular Biology, Karolinska Institutet, SE-171 65, Stockholm, Sweden
| | - Wolfgang Enard
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Munich, Germany
| | - Ines Hellmann
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Munich, Germany.
| |
Collapse
|
38
|
Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet 2019; 20:631-656. [DOI: 10.1038/s41576-019-0150-2] [Citation(s) in RCA: 679] [Impact Index Per Article: 135.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/18/2019] [Indexed: 12/12/2022]
|
39
|
Picó C, Serra F, Rodríguez AM, Keijer J, Palou A. Biomarkers of Nutrition and Health: New Tools for New Approaches. Nutrients 2019; 11:E1092. [PMID: 31100942 PMCID: PMC6567133 DOI: 10.3390/nu11051092] [Citation(s) in RCA: 107] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 05/07/2019] [Accepted: 05/08/2019] [Indexed: 12/18/2022] Open
Abstract
A main challenge in nutritional studies is the valid and reliable assessment of food intake, as well as its effects on the body. Generally, food intake measurement is based on self-reported dietary intake questionnaires, which have inherent limitations. They can be overcome by the use of biomarkers, capable of objectively assessing food consumption without the bias of self-reported dietary assessment. Another major goal is to determine the biological effects of foods and their impact on health. Systems analysis of dynamic responses may help to identify biomarkers indicative of intake and effects on the body at the same time, possibly in relation to individuals' health/disease states. Such biomarkers could be used to quantify intake and validate intake questionnaires, analyse physiological or pathological responses to certain food components or diets, identify persons with specific dietary deficiency, provide information on inter-individual variations or help to formulate personalized dietary recommendations to achieve optimal health for particular phenotypes, currently referred as "precision nutrition." In this regard, holistic approaches using global analysis methods (omics approaches), capable of gathering high amounts of data, appear to be very useful to identify new biomarkers and to enhance our understanding of the role of food in health and disease.
Collapse
Affiliation(s)
- Catalina Picó
- Laboratory of Molecular Biology, Nutrition and Biotechnology (Group of Nutrigenomics and Obesity), CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn) and Instituto de Investigación Sanitaria Illes Balears (IdISBa), University of the Balearic Islands, ES-07122 Palma de Mallorca, Spain.
| | - Francisca Serra
- Laboratory of Molecular Biology, Nutrition and Biotechnology (Group of Nutrigenomics and Obesity), CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn) and Instituto de Investigación Sanitaria Illes Balears (IdISBa), University of the Balearic Islands, ES-07122 Palma de Mallorca, Spain.
| | - Ana María Rodríguez
- Laboratory of Molecular Biology, Nutrition and Biotechnology (Group of Nutrigenomics and Obesity), CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn) and Instituto de Investigación Sanitaria Illes Balears (IdISBa), University of the Balearic Islands, ES-07122 Palma de Mallorca, Spain.
| | - Jaap Keijer
- Human and Animal Physiology, Wageningen University, PO Box 338, 6700 AH Wageningen, The Netherlands.
| | - Andreu Palou
- Laboratory of Molecular Biology, Nutrition and Biotechnology (Group of Nutrigenomics and Obesity), CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn) and Instituto de Investigación Sanitaria Illes Balears (IdISBa), University of the Balearic Islands, ES-07122 Palma de Mallorca, Spain.
| |
Collapse
|
40
|
Neller KCM, Klenov A, Hudak KA. Prediction and Characterization of miRNA/Target Pairs in Non-Model Plants Using RNA-seq. ACTA ACUST UNITED AC 2019; 4:e20090. [PMID: 31083771 PMCID: PMC9285518 DOI: 10.1002/cppb.20090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Plant microRNAs (miRNAs) are ∼20- to 24-nucleotide small RNAs that post-transcriptionally regulate gene expression of mRNA targets. Here, we present a workflow to characterize the miRNA transcriptome of a non-model plant, focusing on miRNAs and targets that are differentially expressed under one experimental treatment. We cover RNA-seq experimental design to create paired small RNA and mRNA libraries and perform quality control of raw data, de novo mRNA transcriptome assembly and annotation, miRNA prediction, differential expression, target identification, and functional enrichment analysis. Additionally, we include validation of differential expression and miRNA-induced target cleavage using qRT-PCR and modified RNA ligase-mediated 5' rapid amplification of cDNA ends, respectively. Our procedure relies on freely available software and web resources. It is intended for users that lack programming skills but can navigate a command-line interface. To enable an understanding of formatting requirements and anticipated results, we provide sample RNA-seq data and key input/output files for each stage. © 2019 The Authors. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
Collapse
Affiliation(s)
- Kira C M Neller
- Department of Biology, York University, Toronto, Ontario, Canada
| | - Alexander Klenov
- Department of Biology, York University, Toronto, Ontario, Canada
| | - Katalin A Hudak
- Department of Biology, York University, Toronto, Ontario, Canada
| |
Collapse
|
41
|
Babarinde IA, Li Y, Hutchins AP. Computational Methods for Mapping, Assembly and Quantification for Coding and Non-coding Transcripts. Comput Struct Biotechnol J 2019; 17:628-637. [PMID: 31193391 PMCID: PMC6526290 DOI: 10.1016/j.csbj.2019.04.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/24/2019] [Accepted: 04/29/2019] [Indexed: 12/17/2022] Open
Abstract
The measurement of gene expression has long provided significant insight into biological functions. The development of high-throughput short-read sequencing technology has revealed transcriptional complexity at an unprecedented scale, and informed almost all areas of biology. However, as researchers have sought to gather more insights from the data, these new technologies have also increased the computational analysis burden. In this review, we describe typical computational pipelines for RNA-Seq analysis and discuss their strengths and weaknesses for the assembly, quantification and analysis of coding and non-coding RNAs. We also discuss the assembly of transposable elements into transcripts, and the difficulty these repetitive elements pose. In summary, RNA-Seq is a powerful technology that is likely to remain a key asset in the biologist's toolkit.
Collapse
Affiliation(s)
- Isaac A Babarinde
- Department of Biology, Southern University of Science and Technology, 1088 Xueyuan Lu, Shenzhen, China
| | - Yuhao Li
- Department of Biology, Southern University of Science and Technology, 1088 Xueyuan Lu, Shenzhen, China
| | - Andrew P Hutchins
- Department of Biology, Southern University of Science and Technology, 1088 Xueyuan Lu, Shenzhen, China
| |
Collapse
|
42
|
Owen N, Moosajee M. RNA-sequencing in ophthalmology research: considerations for experimental design and analysis. Ther Adv Ophthalmol 2019; 11:2515841419835460. [PMID: 30911735 PMCID: PMC6421592 DOI: 10.1177/2515841419835460] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 02/08/2019] [Indexed: 12/13/2022] Open
Abstract
High-throughput, massively parallel sequence analysis has revolutionized the way that researchers design and execute scientific investigations. Vast amounts of sequence data can be generated in short periods of time. Regarding ophthalmology and vision research, extensive interrogation of patient samples for underlying causative DNA mutations has resulted in the discovery of many new genes relevant to eye disease. However, such analysis remains functionally limited. RNA-sequencing accurately snapshots thousands of genes, capturing many subtypes of RNA molecules, and has become the gold standard for transcriptome gene expression quantification. RNA-sequencing has the potential to advance our understanding of eye development and disease; it can reveal new candidates to improve our molecular diagnosis rates and highlight therapeutic targets for intervention. But with a wide range of applications, the design of such experiments can be problematic, no single optimal pipeline exists, and therefore, several considerations must be undertaken for optimal study design. We review the key steps involved in RNA-sequencing experimental design and the downstream bioinformatic pipelines used for differential gene expression. We provide guidance on the application of RNA-sequencing to ophthalmology and sources of open-access eye-related data sets.
Collapse
Affiliation(s)
- Nicholas Owen
- Development, Ageing and Disease Theme, UCL Institute of Ophthalmology, University College London, London, UK
| | | |
Collapse
|