1
|
Marini F, Ludt A, Linke J, Strauch K. GeneTonic: an R/Bioconductor package for streamlining the interpretation of RNA-seq data. BMC Bioinformatics 2021; 22:610. [PMID: 34949163 PMCID: PMC8697502 DOI: 10.1186/s12859-021-04461-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 10/26/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The interpretation of results from transcriptome profiling experiments via RNA sequencing (RNA-seq) can be a complex task, where the essential information is distributed among different tabular and list formats-normalized expression values, results from differential expression analysis, and results from functional enrichment analyses. A number of tools and databases are widely used for the purpose of identification of relevant functional patterns, yet often their contextualization within the data and results at hand is not straightforward, especially if these analytic components are not combined together efficiently. RESULTS We developed the GeneTonic software package, which serves as a comprehensive toolkit for streamlining the interpretation of functional enrichment analyses, by fully leveraging the information of expression values in a differential expression context. GeneTonic is implemented in R and Shiny, leveraging packages that enable HTML-based interactive visualizations for executing drilldown tasks seamlessly, viewing the data at a level of increased detail. GeneTonic is integrated with the core classes of existing Bioconductor workflows, and can accept the output of many widely used tools for pathway analysis, making this approach applicable to a wide range of use cases. Users can effectively navigate interlinked components (otherwise available as flat text or spreadsheet tables), bookmark features of interest during the exploration sessions, and obtain at the end a tailored HTML report, thus combining the benefits of both interactivity and reproducibility. CONCLUSION GeneTonic is distributed as an R package in the Bioconductor project ( https://bioconductor.org/packages/GeneTonic/ ) under the MIT license. Offering both bird's-eye views of the components of transcriptome data analysis and the detailed inspection of single genes, individual signatures, and their relationships, GeneTonic aims at simplifying the process of interpretation of complex and compelling RNA-seq datasets for many researchers with different expertise profiles.
Collapse
Affiliation(s)
- Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
- Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
| | - Annekathrin Ludt
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
| | - Jan Linke
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
- Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
| | - Konstantin Strauch
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
| |
Collapse
|
2
|
Helmy M, Agrawal R, Ali J, Soudy M, Bui TT, Selvarajoo K. GeneCloudOmics: A Data Analytic Cloud Platform for High-Throughput Gene Expression Analysis. FRONTIERS IN BIOINFORMATICS 2021; 1:693836. [PMID: 36303746 PMCID: PMC9581002 DOI: 10.3389/fbinf.2021.693836] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 10/14/2021] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiling techniques, such as DNA microarray and RNA-Sequencing, have provided significant impact on our understanding of biological systems. They contribute to almost all aspects of biomedical research, including studying developmental biology, host-parasite relationships, disease progression and drug effects. However, the high-throughput data generations present challenges for many wet experimentalists to analyze and take full advantage of such rich and complex data. Here we present GeneCloudOmics, an easy-to-use web server for high-throughput gene expression analysis that extends the functionality of our previous ABioTrans with several new tools, including protein datasets analysis, and a web interface. GeneCloudOmics allows both microarray and RNA-Seq data analysis with a comprehensive range of data analytics tools in one package that no other current standalone software or web-based tool can do. In total, GeneCloudOmics provides the user access to 23 different data analytical and bioinformatics tasks including reads normalization, scatter plots, linear/non-linear correlations, PCA, clustering (hierarchical, k-means, t-SNE, SOM), differential expression analyses, pathway enrichments, evolutionary analyses, pathological analyses, and protein-protein interaction (PPI) identifications. Furthermore, GeneCloudOmics allows the direct import of gene expression data from the NCBI Gene Expression Omnibus database. The user can perform all tasks rapidly through an intuitive graphical user interface that overcomes the hassle of coding, installing tools/packages/libraries and dealing with operating systems compatibility and version issues, complications that make data analysis tasks challenging for biologists. Thus, GeneCloudOmics is a one-stop open-source tool for gene expression data analysis and visualization. It is freely available at http://combio-sifbi.org/GeneCloudOmics.
Collapse
Affiliation(s)
- Mohamed Helmy
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
| | - Rahul Agrawal
- Department of Geology and Geophysics, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, India
| | - Javed Ali
- Department of Geology and Geophysics, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, India
| | - Mohamed Soudy
- Proteomics and Metabolomics Unit, Children Cancer Hospital (CCHE-57357), Cairo, Egypt
| | - Thuy Tien Bui
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Kumar Selvarajoo
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore (NUS), Singapore, Singapore
- *Correspondence: Kumar Selvarajoo,
| |
Collapse
|
3
|
Zhao H, Tang X, Wu M, Li Q, Yi X, Liu S, Jiang J, Wang S, Sun X. Transcriptome Characterization of Short Distance Transport Stress in Beef Cattle Blood. Front Genet 2021; 12:616388. [PMID: 33643382 PMCID: PMC7902800 DOI: 10.3389/fgene.2021.616388] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 01/19/2021] [Indexed: 12/20/2022] Open
Abstract
The transportation is a crucial phase in beef cattle industry, and the annual losses caused by beef cattle transport stress are substantial. Several studies have described the effect of long distance transportation stress on animal health, such as disorder in nervous, endocrine, immune, and metabolic system. However, molecular mechanisms underlying short distance transportation stress is still poorly understood. Present study aims to investigate the effect of short distance transportation by measuring the hematological indices and transcriptomic analysis. In this study, a total 10 Qinchuan cattle were used to compare the molecular characteristics of blood before and after transportation. We have found that a stress-related marker "white blood cell count (WBC)" increased significantly after transportation. The decrease in triglyceride (TG), cholestenone (CHO), high-density lipoprotein (HDL), and low-density lipoprotein (LDL) showed that energy expenditure was increased after transportation, but not enough to activate fatty decomposition. Intriguingly, the decrease of malondialdehyde (MDA) showed that cattle were more resilience to oxidative stress. The RNA-seq showed that 1,092 differentially expressed genes (DEGs) were found (329 up-regulated and 763 down-regulated) between group before and group after. The GO and KEGG enrichment showed that the metabolic pathway and B cell function related pathways were enriched. Furthermore, median absolute deviation (MAD) top 5,000 genes were used to construct a co-expression network by weighted correlation network analysis (WGCNA), and 11 independent modules were identified. Combing with protein-protein interaction (PPI) analysis, the verification of quantitative real-time PCR (qPCR) and the correlation of B cell function, structural maintenance of chromosomes 3 (SMC3), jun proto-oncogene (JUN), and C-X-C motif chemokine ligand 10 (CXCL10) were suggested as potential molecular markers in identification of short distance transportation. Collectively, the blood RNA-seq analysis and WGCNA indicated that the disorder of B cell differentiation, proliferation, survival, and apoptosis were the potential molecular mechanism in short distance transportation stress. In conclusion, our results provide the novel insight about potential biomarkers for short distance transportation stress, which may serve as for diagnosing and preventing this condition in beef industry.
Collapse
Affiliation(s)
- Haidong Zhao
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Xiaoqin Tang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Mingli Wu
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Qi Li
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Xiaohua Yi
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Shirong Liu
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Junyi Jiang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Shuhui Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Xiuzhu Sun
- College of Animal Science and Technology, Northwest A&F University, Yangling, China.,College of Grassland Agriculture, Northwest A&F University, Yangling, China
| |
Collapse
|
4
|
Marini F, Linke J, Binder H. ideal: an R/Bioconductor package for interactive differential expression analysis. BMC Bioinformatics 2020; 21:565. [PMID: 33297942 PMCID: PMC7724894 DOI: 10.1186/s12859-020-03819-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 10/15/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND RNA sequencing (RNA-seq) is an ever increasingly popular tool for transcriptome profiling. A key point to make the best use of the available data is to provide software tools that are easy to use but still provide flexibility and transparency in the adopted methods. Despite the availability of many packages focused on detecting differential expression, a method to streamline this type of bioinformatics analysis in a comprehensive, accessible, and reproducible way is lacking. RESULTS We developed the ideal software package, which serves as a web application for interactive and reproducible RNA-seq analysis, while producing a wealth of visualizations to facilitate data interpretation. ideal is implemented in R using the Shiny framework, and is fully integrated with the existing core structures of the Bioconductor project. Users can perform the essential steps of the differential expression analysis workflow in an assisted way, and generate a broad spectrum of publication-ready outputs, including diagnostic and summary visualizations in each module, all the way down to functional analysis. ideal also offers the possibility to seamlessly generate a full HTML report for storing and sharing results together with code for reproducibility. CONCLUSION ideal is distributed as an R package in the Bioconductor project ( http://bioconductor.org/packages/ideal/ ), and provides a solution for performing interactive and reproducible analyses of summarized RNA-seq expression data, empowering researchers with many different profiles (life scientists, clinicians, but also experienced bioinformaticians) to make the ideal use of the data at hand.
Collapse
Affiliation(s)
- Federico Marini
- Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
| | - Jan Linke
- Center for Thrombosis and Hemostasis (CTH), University Medical Center of the Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131 Mainz, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany
| | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Stefan-Meier-Str. 26, 79104 Freiburg, Germany
| |
Collapse
|
5
|
Prieto C, Barrios D. RaNA-Seq: Interactive RNA-Seq analysis from FASTQ files to functional analysis. Bioinformatics 2019; 36:btz854. [PMID: 31730197 DOI: 10.1093/bioinformatics/btz854] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 10/04/2019] [Accepted: 11/11/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY RaNA-Seq is a cloud platform for the rapid analysis and visualization of RNA-Seq data. It performs a full analysis in minutes by quantifying FASTQ files, calculating quality control metrics, running differential expression analyses and enabling the explanation of results with functional analyses. Our analysis pipeline applies generally accepted and reproducible protocols that can be applied with two simple steps in its web interface. Analysis results are presented as interactive graphics and reports, ready for their interpretation and publication. AVAILABILITY RaNA-Seq web service is freely available online at https://ranaseq.eu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos Prieto
- Bioinformatics Service, Nucleus, University of Salamanca, Plaza Doctores de la Reina, Salamanca, Spain
| | - David Barrios
- Bioinformatics Service, Nucleus, University of Salamanca, Plaza Doctores de la Reina, Salamanca, Spain
| |
Collapse
|
6
|
Zou Y, Bui TT, Selvarajoo K. ABioTrans: A Biostatistical Tool for Transcriptomics Analysis. Front Genet 2019; 10:499. [PMID: 31214245 PMCID: PMC6555198 DOI: 10.3389/fgene.2019.00499] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 05/07/2019] [Indexed: 11/13/2022] Open
Abstract
Here we report a bio-statistical/informatics tool, ABioTrans, developed in R for gene expression analysis. The tool allows the user to directly read RNA-Seq data files deposited in the Gene Expression Omnibus or GEO database. Operated using any web browser application, ABioTrans provides easy options for multiple statistical distribution fitting, Pearson and Spearman rank correlations, PCA, k-means and hierarchical clustering, differential expression (DE) analysis, Shannon entropy and noise (square of coefficient of variation) analyses, as well as Gene ontology classifications.
Collapse
Affiliation(s)
- Yutong Zou
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
| | - Thuy Tien Bui
- Biotransformation Innovation Platform (BioTrans), Agency for Science, Technology and Research (ASTAR), Singapore, Singapore
| | - Kumar Selvarajoo
- Biotransformation Innovation Platform (BioTrans), Agency for Science, Technology and Research (ASTAR), Singapore, Singapore
| |
Collapse
|
7
|
Karim MR, Michel A, Zappa A, Baranov P, Sahay R, Rebholz-Schuhmann D. Improving data workflow systems with cloud services and use of open data for bioinformatics research. Brief Bioinform 2019; 19:1035-1050. [PMID: 28419324 PMCID: PMC6169675 DOI: 10.1093/bib/bbx039] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 11/22/2022] Open
Abstract
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community.
Collapse
Affiliation(s)
- Md Rezaul Karim
- Semantics in eHealth and Life Sciences (SeLS), Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
| | - Audrey Michel
- School of Biochemistry and Cell Biology, University College Cork, Ireland
| | - Achille Zappa
- Insight Centre for Data Analytics, National University of Ireland Galway, Dangan, Galway, Ireland
| | - Pavel Baranov
- School of Biochemistry and Cell Biology, University College Cork, Ireland
| | - Ratnesh Sahay
- Semantics in eHealth and Life Sciences (SeLS), Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
| | | |
Collapse
|
8
|
López-Fernández H, Blanco-Míguez A, Fdez-Riverola F, Sánchez B, Lourenço A. DEWE: A novel tool for executing differential expression RNA-Seq workflows in biomedical research. Comput Biol Med 2019; 107:197-205. [PMID: 30849608 DOI: 10.1016/j.compbiomed.2019.02.021] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 02/21/2019] [Accepted: 02/21/2019] [Indexed: 01/31/2023]
Abstract
BACKGROUND Transcriptomics profiling aims to identify and quantify all transcripts present within a cell type or tissue at a particular state, and thus provide information on the genes expressed in specific experimental settings, differentiation or disease conditions. RNA-Seq technology is becoming the standard approach for such studies, but available analysis tools are often hard to install, configure and use by users without advanced bioinformatics skills. METHODS Within reason, DEWE aims to make RNA-Seq analysis as easy for non-proficient users as for experienced bioinformaticians. DEWE supports two well-established and widely used differential expression analysis workflows: using Bowtie2 or HISAT2 for sequence alignment; and, both applying StringTie for quantification, and Ballgown and edgeR for differential expression analysis. Also, it enables the tailored execution of individual tools as well as helps with the management and visualisation of differential expression results. RESULTS DEWE provides a user-friendly interface designed to reduce the learning curve of less knowledgeable users while enabling analysis customisation and software extension by advanced users. Docker technology helps overcome installation and configuration hurdles. In addition, DEWE produces high quality and publication-ready outputs in the form of tab-delimited files and figures, as well as helps researchers with further analyses, such as pathway enrichment analysis. CONCLUSIONS The abilities of DEWE are exemplified here by practical application to a comparative analysis of monocytes and monocyte-derived dendritic cells, a study of clinical relevance. DEWE installers and documentation are freely available at https://www.sing-group.org/dewe.
Collapse
Affiliation(s)
- Hugo López-Fernández
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain; SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Hospital Álvaro Cunqueiro, 36312, Vigo, Spain; Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal; Instituto de Biologia Molecular e Celular (IBMC), Rúa Alfredo Allen, 208, 4200-135, Porto, Portugal
| | - Aitor Blanco-Míguez
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain; Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Paseo Río Linares s/n, 33300, Villaviciosa, Asturias, Spain
| | - Florentino Fdez-Riverola
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain; SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Hospital Álvaro Cunqueiro, 36312, Vigo, Spain
| | - Borja Sánchez
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Paseo Río Linares s/n, 33300, Villaviciosa, Asturias, Spain
| | - Anália Lourenço
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain; SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Hospital Álvaro Cunqueiro, 36312, Vigo, Spain; CEB - Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal.
| |
Collapse
|
9
|
Nelson JW, Sklenar J, Barnes AP, Minnier J. The START App: a web-based RNAseq analysis and visualization resource. Bioinformatics 2018; 33:447-449. [PMID: 28171615 DOI: 10.1093/bioinformatics/btw624] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 09/09/2016] [Accepted: 09/23/2016] [Indexed: 11/14/2022] Open
Abstract
Summary Transcriptional profiling using RNA sequencing (RNAseq) has emerged as a powerful methodology to quantify global gene expression patterns in various contexts from single cells to whole tissues. The tremendous amount of data generated by this profiling technology presents a daunting challenge in terms of effectively visualizing and interpreting results. Convenient and intuitive data interfaces are critical for researchers to easily upload, analyze and visualize their RNAseq data. We designed the START (Shiny Transcriptome Analysis Resource Tool) App with these requirements in mind. This application has the power and flexibility to be resident on a local computer or serve as a web-based environment, enabling easy sharing of data between researchers and collaborators. Availability and Implementation Source Code for the START App is written entirely in R and can be freely available to download at https://github.com/jminnier/STARTapp with the code licensed under GPLv3. It can be launched on any system that has R installed. The START App is also hosted on https://kcvi.shinyapps.io/START for researchers to temporarily upload their data. Contact minnier@ohsu.edu
Collapse
Affiliation(s)
- Jonathan W Nelson
- The Knight Cardiovascular Institute, Oregon Health & Science University, Portland, OR, USA
| | - Jiri Sklenar
- The Knight Cardiovascular Institute, Oregon Health & Science University, Portland, OR, USA
| | - Anthony P Barnes
- Department of Pediatrics, Oregon Health & Science University, Portland, OR, USA
| | - Jessica Minnier
- The Knight Cardiovascular Institute, Oregon Health & Science University, Portland, OR, USA.,School of Public Health, Oregon Health & Science University, Portland, OR, USA
| |
Collapse
|
10
|
Lott SC, Wolfien M, Riege K, Bagnacani A, Wolkenhauer O, Hoffmann S, Hess WR. Customized workflow development and data modularization concepts for RNA-Sequencing and metatranscriptome experiments. J Biotechnol 2017; 261:85-96. [DOI: 10.1016/j.jbiotec.2017.06.1203] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 06/22/2017] [Accepted: 06/26/2017] [Indexed: 12/14/2022]
|
11
|
Raddatz BB, Spitzbarth I, Matheis KA, Kalkuhl A, Deschl U, Baumgärtner W, Ulrich R. Microarray-Based Gene Expression Analysis for Veterinary Pathologists: A Review. Vet Pathol 2017. [PMID: 28641485 DOI: 10.1177/0300985817709887] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
High-throughput, genome-wide transcriptome analysis is now commonly used in all fields of life science research and is on the cusp of medical and veterinary diagnostic application. Transcriptomic methods such as microarrays and next-generation sequencing generate enormous amounts of data. The pathogenetic expertise acquired from understanding of general pathology provides veterinary pathologists with a profound background, which is essential in translating transcriptomic data into meaningful biological knowledge, thereby leading to a better understanding of underlying disease mechanisms. The scientific literature concerning high-throughput data-mining techniques usually addresses mathematicians or computer scientists as the target audience. In contrast, the present review provides the reader with a clear and systematic basis from a veterinary pathologist's perspective. Therefore, the aims are (1) to introduce the reader to the necessary methodological background; (2) to introduce the sequential steps commonly performed in a microarray analysis including quality control, annotation, normalization, selection of differentially expressed genes, clustering, gene ontology and pathway analysis, analysis of manually selected genes, and biomarker discovery; and (3) to provide references to publically available and user-friendly software suites. In summary, the data analysis methods presented within this review will enable veterinary pathologists to analyze high-throughput transcriptome data obtained from their own experiments, supplemental data that accompany scientific publications, or public repositories in order to obtain a more in-depth insight into underlying disease mechanisms.
Collapse
Affiliation(s)
- Barbara B Raddatz
- 1 Department of Pathology, University of Veterinary Medicine Hannover, Hannover, Germany.,2 Center of Systems Neuroscience, Hannover, Germany
| | - Ingo Spitzbarth
- 1 Department of Pathology, University of Veterinary Medicine Hannover, Hannover, Germany.,2 Center of Systems Neuroscience, Hannover, Germany
| | - Katja A Matheis
- 3 Department of Nonclinical Drug Safety, Boehringer Ingelheim Pharma GmbH & Co KG, Biberach (Riß), Germany
| | - Arno Kalkuhl
- 3 Department of Nonclinical Drug Safety, Boehringer Ingelheim Pharma GmbH & Co KG, Biberach (Riß), Germany
| | - Ulrich Deschl
- 3 Department of Nonclinical Drug Safety, Boehringer Ingelheim Pharma GmbH & Co KG, Biberach (Riß), Germany
| | - Wolfgang Baumgärtner
- 1 Department of Pathology, University of Veterinary Medicine Hannover, Hannover, Germany.,2 Center of Systems Neuroscience, Hannover, Germany
| | - Reiner Ulrich
- 1 Department of Pathology, University of Veterinary Medicine Hannover, Hannover, Germany.,2 Center of Systems Neuroscience, Hannover, Germany.,4 Department of Experimental Animal Facilities and Biorisk Management, Friedrich-Loeffler-Institute, Greifswald, Germany
| |
Collapse
|
12
|
Williams CR, Baccarella A, Parrish JZ, Kim CC. Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq. BMC Bioinformatics 2017; 18:38. [PMID: 28095772 PMCID: PMC5240434 DOI: 10.1186/s12859-016-1457-z] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 12/31/2016] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND RNA-Seq has supplanted microarrays as the preferred method of transcriptome-wide identification of differentially expressed genes. However, RNA-Seq analysis is still rapidly evolving, with a large number of tools available for each of the three major processing steps: read alignment, expression modeling, and identification of differentially expressed genes. Although some studies have benchmarked these tools against gold standard gene expression sets, few have evaluated their performance in concert with one another. Additionally, there is a general lack of testing of such tools on real-world, physiologically relevant datasets, which often possess qualities not reflected in tightly controlled reference RNA samples or synthetic datasets. RESULTS Here, we evaluate 219 combinatorial implementations of the most commonly used analysis tools for their impact on differential gene expression analysis by RNA-Seq. A test dataset was generated using highly purified human classical and nonclassical monocyte subsets from a clinical cohort, allowing us to evaluate the performance of 495 unique workflows, when accounting for differences in expression units and gene- versus transcript-level estimation. We find that the choice of methodologies leads to wide variation in the number of genes called significant, as well as in performance as gauged by precision and recall, calculated by comparing our RNA-Seq results to those from four previously published microarray and BeadChip analyses of the same cell populations. The method of differential gene expression identification exhibited the strongest impact on performance, with smaller impacts from the choice of read aligner and expression modeler. Many workflows were found to exhibit similar overall performance, but with differences in their calibration, with some biased toward higher precision and others toward higher recall. CONCLUSIONS There is significant heterogeneity in the performance of RNA-Seq workflows to identify differentially expressed genes. Among the higher performing workflows, different workflows exhibit a precision/recall tradeoff, and the ultimate choice of workflow should take into consideration how the results will be used in subsequent applications. Our analyses highlight the performance characteristics of these workflows, and the data generated in this study could also serve as a useful resource for future development of software for RNA-Seq analysis.
Collapse
Affiliation(s)
- Claire R Williams
- Department of Biology, University of Washington, Seattle, WA, 98195, USA
| | - Alyssa Baccarella
- Division of Experimental Medicine, Department of Medicine, University of California, San Francisco, CA, 94143, USA
| | - Jay Z Parrish
- Department of Biology, University of Washington, Seattle, WA, 98195, USA
| | - Charles C Kim
- Division of Experimental Medicine, Department of Medicine, University of California, San Francisco, CA, 94143, USA. .,Present address: Verily, South San Francisco, CA, 94080, USA.
| |
Collapse
|
13
|
Bianchi V, Ceol A, Ogier AGE, de Pretis S, Galeota E, Kishore K, Bora P, Croci O, Campaner S, Amati B, Morelli MJ, Pelizzola M. Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions. Front Genet 2016; 7:75. [PMID: 27200084 PMCID: PMC4858535 DOI: 10.3389/fgene.2016.00075] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 04/18/2016] [Indexed: 02/06/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.
Collapse
Affiliation(s)
- Valerio Bianchi
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Arnaud Ceol
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Alessandro G E Ogier
- Department of Experimental Oncology, European Institute of Oncology Milano, Italy
| | - Stefano de Pretis
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Eugenia Galeota
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Kamal Kishore
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Pranami Bora
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Ottavio Croci
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Stefano Campaner
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Bruno Amati
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di TecnologiaMilano, Italy; Department of Experimental Oncology, European Institute of OncologyMilano, Italy
| | - Marco J Morelli
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| | - Mattia Pelizzola
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia Milano, Italy
| |
Collapse
|
14
|
Russo F, Righelli D, Angelini C. Advancements in RNASeqGUI towards a Reproducible Analysis of RNA-Seq Experiments. BIOMED RESEARCH INTERNATIONAL 2016; 2016:7972351. [PMID: 26977414 PMCID: PMC4764726 DOI: 10.1155/2016/7972351] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Revised: 12/11/2015] [Accepted: 01/03/2016] [Indexed: 11/17/2022]
Abstract
We present the advancements and novelties recently introduced in RNASeqGUI, a graphical user interface that helps biologists to handle and analyse large data collected in RNA-Seq experiments. This work focuses on the concept of reproducible research and shows how it has been incorporated in RNASeqGUI to provide reproducible (computational) results. The novel version of RNASeqGUI combines graphical interfaces with tools for reproducible research, such as literate statistical programming, human readable report, parallel executions, caching, and interactive and web-explorable tables of results. These features allow the user to analyse big datasets in a fast, efficient, and reproducible way. Moreover, this paper represents a proof of concept, showing a simple way to develop computational tools for Life Science in the spirit of reproducible research.
Collapse
Affiliation(s)
- Francesco Russo
- Istituto per le Applicazioni del Calcolo, CNR, 80131 Napoli, Italy
| | - Dario Righelli
- Istituto per le Applicazioni del Calcolo, CNR, 80131 Napoli, Italy
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo, CNR, 80131 Napoli, Italy
| |
Collapse
|