Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hung LH, Kristiyanto D, Lee SB, Yeung KY. GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research. PLoS One 2016;11:e0152686. [PMID: 27045593 PMCID: PMC4821530 DOI: 10.1371/journal.pone.0152686] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 03/17/2016] [Indexed: 12/03/2022] Open

For:	Hung LH, Kristiyanto D, Lee SB, Yeung KY. GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research. PLoS One 2016;11:e0152686. [PMID: 27045593 PMCID: PMC4821530 DOI: 10.1371/journal.pone.0152686] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 03/17/2016] [Indexed: 12/03/2022] Open

Number

Cited by Other Article(s)

Bernard M, Poli M, Karadayi J, Dupoux E. Shennong: A Python toolbox for audio speech features extraction. Behav Res Methods 2023;55:4489-4501. [PMID: 36750521 DOI: 10.3758/s13428-022-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2022] [Indexed: 02/09/2023]

Hung LH, Straw E, Reddy S, Schmitz R, Colburn Z, Yeung KY. Cloud-enabled Biodepot workflow builder integrates image processing using Fiji with reproducible data analysis using Jupyter notebooks. Sci Rep 2022;12:14920. [PMID: 36056115 PMCID: PMC9440253 DOI: 10.1038/s41598-022-19173-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 08/25/2022] [Indexed: 11/16/2022] Open

Abstract

Modern biomedical image analyses workflows contain multiple computational processing tasks giving rise to problems in reproducibility. In addition, image datasets can span both spatial and temporal dimensions, with additional channels for fluorescence and other data, resulting in datasets that are too large to be processed locally on a laptop. For omics analyses, software containers have been shown to enhance reproducibility, facilitate installation and provide access to scalable computational resources on the cloud. However, most image analyses contain steps that are graphical and interactive, features that are not supported by most omics execution engines. We present the containerized and cloud-enabled Biodepot-workflow-builder platform that supports graphics from software containers and has been extended for image analyses. We demonstrate the potential of our modular approach with multi-step workflows that incorporate the popular and open-source Fiji suite for image processing. One of our examples integrates fully interactive ImageJ macros with Jupyter notebooks. Our second example illustrates how the complicated cloud setup of an computationally intensive process such as stitching 3D digital pathology datasets using BigStitcher can be automated and simplified. In both examples, users can leverage a form-based graphical interface to execute multi-step workflows with a single click, using the provided sample data and preset input parameters. Alternatively, users can interactively modify the image processing steps in the workflow, apply the workflows to their own data, change the input parameters and macros. By providing interactive graphics support to software containers, our modular platform supports reproducible image analysis workflows, simplified access to cloud resources for analysis of large datasets, and integration across different applications such as Jupyter.

Collapse

Yu M, Dolios G, Petrick L. Reproducible untargeted metabolomics workflow for exhaustive MS2 data acquisition of MS1 features. J Cheminform 2022;14:6. [PMID: 35172886 PMCID: PMC8848943 DOI: 10.1186/s13321-022-00586-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 02/03/2022] [Indexed: 01/16/2023] Open

Abstract

Unknown features in untargeted metabolomics and non-targeted analysis (NTA) are identified using fragment ions from MS/MS spectra to predict the structures of the unknown compounds. The precursor ion selected for fragmentation is commonly performed using data dependent acquisition (DDA) strategies or following statistical analysis using targeted MS/MS approaches. However, the selected precursor ions from DDA only cover a biased subset of the peaks or features found in full scan data. In addition, different statistical analysis can select different precursor ions for MS/MS analysis, which make the post-hoc validation of ions selected following a secondary analysis impossible for precursor ions selected by the original statistical method. Here we propose an automated, exhaustive, statistical model-free workflow: paired mass distance-dependent analysis (PMDDA), for reproducible untargeted mass spectrometry MS2 fragment ion collection of unknown compounds found in MS1 full scan. Our workflow first removes redundant peaks from MS1 data and then exports a list of precursor ions for pseudo-targeted MS/MS analysis on independent peaks. This workflow provides comprehensive coverage of MS2 collection on unknown compounds found in full scan analysis using a “one peak for one compound” workflow without a priori redundant peak information. We compared pseudo-spectra formation and the number of MS2 spectra linked to MS1 data using the PMDDA workflow to that obtained using CAMERA and RAMclustR algorithms. More annotated compounds, molecular networks, and unique MS/MS spectra were found using PMDDA compared with CAMERA and RAMClustR. In addition, PMDDA can generate a preferred ion list for iterative DDA to enhance coverage of compounds when instruments support such functions. Finally, compounds with signals in both positive and negative modes can be identified by the PMDDA workflow, to further reduce redundancies. The whole workflow is fully reproducible as a docker image xcmsrocker with both the original data and the data processing template.

Collapse

Krampis K. Democratizing bioinformatics through easily accessible software platforms for non-experts in the field. Biotechniques 2022;72:36-38. [PMID: 35060754 PMCID: PMC8988881 DOI: 10.2144/btn-2021-0060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Du X, Aristizabal-Henao JJ, Garrett TJ, Brochhausen M, Hogan WR, Lemas DJ. A Checklist for Reproducible Computational Analysis in Clinical Metabolomics Research. Metabolites 2022;12:87. [PMID: 35050209 PMCID: PMC8779534 DOI: 10.3390/metabo12010087] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 12/25/2021] [Accepted: 01/10/2022] [Indexed: 12/15/2022] Open

Plonski NM, Johnson E, Frederick M, Mercer H, Fraizer G, Meindl R, Casadesus G, Piontkivska H. Automated Isoform Diversity Detector (AIDD): a pipeline for investigating transcriptome diversity of RNA-seq data. BMC Bioinformatics 2020;21:578. [PMID: 33375933 PMCID: PMC7772930 DOI: 10.1186/s12859-020-03888-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 11/18/2020] [Indexed: 11/16/2022] Open

Abstract

Background

As the number of RNA-seq datasets that become available to explore transcriptome diversity increases, so does the need for easy-to-use comprehensive computational workflows. Many available tools facilitate analyses of one of the two major mechanisms of transcriptome diversity, namely, differential expression of isoforms due to alternative splicing, while the second major mechanism—RNA editing due to post-transcriptional changes of individual nucleotides—remains under-appreciated. Both these mechanisms play an essential role in physiological and diseases processes, including cancer and neurological disorders. However, elucidation of RNA editing events at transcriptome-wide level requires increasingly complex computational tools, in turn resulting in a steep entrance barrier for labs who are interested in high-throughput variant calling applications on a large scale but lack the manpower and/or computational expertise.

Results

Here we present an easy-to-use, fully automated, computational pipeline (Automated Isoform Diversity Detector, AIDD) that contains open source tools for various tasks needed to map transcriptome diversity, including RNA editing events. To facilitate reproducibility and avoid system dependencies, the pipeline is contained within a pre-configured VirtualBox environment. The analytical tasks and format conversions are accomplished via a set of automated scripts that enable the user to go from a set of raw data, such as fastq files, to publication-ready results and figures in one step. A publicly available dataset of Zika virus-infected neural progenitor cells is used to illustrate AIDD’s capabilities.

Conclusions

AIDD pipeline offers a user-friendly interface for comprehensive and reproducible RNA-seq analyses. Among unique features of AIDD are its ability to infer RNA editing patterns, including ADAR editing, and inclusion of Guttman scale patterns for time series analysis of such editing landscapes. AIDD-based results show importance of diversity of ADAR isoforms, key RNA editing enzymes linked with the innate immune system and viral infections. These findings offer insights into the potential role of ADAR editing dysregulation in the disease mechanisms, including those of congenital Zika syndrome. Because of its automated all-inclusive features, AIDD pipeline enables even a novice user to easily explore common mechanisms of transcriptome diversity, including RNA editing landscapes.

Collapse

Wittman JT, Aukema BH. A Guide and Toolbox to Replicability and Open Science in Entomology. JOURNAL OF INSECT SCIENCE (ONLINE) 2020;20:6. [PMID: 32441307 PMCID: PMC7423018 DOI: 10.1093/jisesa/ieaa036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Indexed: 05/04/2023]

Lachmann A, Clarke DJB, Torre D, Xie Z, Ma'ayan A. Interoperable RNA-Seq analysis in the cloud. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2020;1863:194521. [PMID: 32156561 DOI: 10.1016/j.bbagrm.2020.194521] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 03/01/2020] [Accepted: 03/01/2020] [Indexed: 12/25/2022]

Building Containerized Workflows Using the BioDepot-Workflow-Builder. Cell Syst 2019;9:508-514.e3. [PMID: 31521606 DOI: 10.1016/j.cels.2019.08.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Revised: 05/21/2019] [Accepted: 08/16/2019] [Indexed: 11/22/2022]

Aciole Barbosa D, Menegidio FB, Alencar VC, Gonçalves RS, Silva JDFS, Vilas Boas RO, Faustino de Maria YNL, Jabes DL, Costa de Oliveira R, Nunes LR. ParaDB: A manually curated database containing genomic annotation for the human pathogenic fungi Paracoccidioides spp. PLoS Negl Trop Dis 2019;13:e0007576. [PMID: 31306428 PMCID: PMC6658007 DOI: 10.1371/journal.pntd.0007576] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 07/25/2019] [Accepted: 06/24/2019] [Indexed: 11/18/2022] Open

Abstract

BACKGROUND

The genus Paracoccidioides consists of thermodymorphic fungi responsible for Paracoccidioidomycosis (PCM), a systemic mycosis that has been registered to affect ~10 million people in Latin America. Biogeographical data subdivided the genus Paracoccidioides in five divergent subgroups, which have been recently classified as different species. Genomic sequencing of five Paracoccidioides isolates, representing each of these subgroups/species provided an important framework for the development of post-genomic studies with these fungi. However, functional annotations of these genomes have not been submitted to manual curation and, as a result, ~60-90% of the Paracoccidioides protein-coding genes (depending on isolate/annotation) are currently described as responsible for hypothetical proteins, without any further functional/structural description.

PRINCIPAL FINDINGS

The present work reviews the functional assignment of Paracoccidioides genes, reducing the number of hypothetical proteins to ~25-28%. These results were compiled in a relational database called ParaDB, dedicated to the main representatives of Paracoccidioides spp. ParaDB can be accessed through a friendly graphical interface, which offers search tools based on keywords or protein/DNA sequences. All data contained in ParaDB can be partially or completely downloaded through spreadsheet, multi-fasta and GFF3-formatted files, which can be subsequently used in a variety of downstream functional analyses. Moreover, the entire ParaDB environment has been configured in a Docker service, which has been submitted to the GitHub repository, ensuring long-term data availability to researchers. This service can be downloaded and used to perform fully functional local installations of the database in alternative computing ecosystems, allowing users to conduct their data mining and analyses in a personal and stable working environment.

CONCLUSIONS

These new annotations greatly reduce the number of genes identified solely as hypothetical proteins and are integrated into a dedicated database, providing resources to assist researchers in this field to conduct post-genomic studies with this group of human pathogenic fungi.

Collapse

WordSeg: Standardizing unsupervised word form segmentation from text. Behav Res Methods 2019;52:264-278. [PMID: 30937845 DOI: 10.3758/s13428-019-01223-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Menegidio FB, Aciole Barbosa D, Gonçalves RDS, Nishime MM, Jabes DL, Costa de Oliveira R, Nunes LR. Bioportainer Workbench: a versatile and user-friendly system that integrates implementation, management, and use of bioinformatics resources in Docker environments. Gigascience 2019;8:5479503. [PMID: 31222200 PMCID: PMC6482343 DOI: 10.1093/gigascience/giz041] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 12/18/2018] [Accepted: 03/21/2019] [Indexed: 11/14/2022] Open

Vázquez N, López-Fernández H, Vieira CP, Fdez-Riverola F, Vieira J, Reboiro-Jato M. BDBM 1.0: A Desktop Application for Efficient Retrieval and Processing of High-Quality Sequence Data and Application to the Identification of the Putative Coffea S-Locus. Interdiscip Sci 2019;11:57-67. [PMID: 30712176 DOI: 10.1007/s12539-019-00320-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 01/22/2019] [Accepted: 01/24/2019] [Indexed: 11/25/2022]

Abstract

Nowadays, bioinformatics is one of the most important areas in modern biology and the creation of high-quality scientific software supporting this recent research area is one of the core activities of many researchers. In this context, high-quality sequence datasets are needed to perform inferences on the evolution of species, genes, and gene families, or to get evidence for adaptive amino acid evolution, among others. Nevertheless, sequence data are very often spread over several databases, many useful genomes and transcriptomes are non-annotated, the available annotation is not for the desired coding sequence isoform, and/or is unlikely to be accurate. Moreover, although the FASTA text-based format is quite simple and usable by most software applications, there are a number of issues that may be critical depending on the software used to analyse such files. Therefore, researchers without training in informatics often use a fraction of all available data. The above issues can be addressed using already available software applications, but there is no easy-to-use single piece of software that allows performing all these tasks within the same graphical interface, such as the one here presented, named BDBM (Blast DataBase Manager). BDBM can be used to efficiently get gene sequences from annotated and non-annotated genomes and transcriptomes. Moreover, it can be used to look for alternatives to existing annotations and to easily create reliable custom databases. Such databases are essential to prepare high-quality datasets. The analyses that we have performed on the Coffea canephora genome using BDBM aimed at the identification of the S-locus region (that harbours the genes involved in gametophytic self-incompatibility) led to the conclusion that there are two likely regions, one on chromosome 2 (around region 6600000-6650000), and another on chromosome 5 (around 15830000-15930000). Such findings are discussed in the context of the Rubiaceae gametophytic self-incompatibility evolution.

Collapse

Affiliation(s)

Noé Vázquez ESEI-Escuela Superior de Ingeniería Informática, Universidade de Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain CINBIO-Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain
Hugo López-Fernández ESEI-Escuela Superior de Ingeniería Informática, Universidade de Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain. CINBIO-Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain. SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain. Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135, Porto, Portugal. Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal.
Cristina P Vieira Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135, Porto, Portugal Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal
Florentino Fdez-Riverola ESEI-Escuela Superior de Ingeniería Informática, Universidade de Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain CINBIO-Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
Jorge Vieira Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135, Porto, Portugal Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal
Miguel Reboiro-Jato ESEI-Escuela Superior de Ingeniería Informática, Universidade de Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain CINBIO-Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain

Collapse

Vaidyam A, Halamka J, Torous J. Actionable digital phenotyping: a framework for the delivery of just-in-time and longitudinal interventions in clinical healthcare. Mhealth 2019;5:25. [PMID: 31559270 PMCID: PMC6737424 DOI: 10.21037/mhealth.2019.07.04] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 07/15/2019] [Indexed: 11/06/2022] Open

Liu DM, Salganik MJ. Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge. SOCIUS : SOCIOLOGICAL RESEARCH FOR A DYNAMIC WORLD 2019;5:10.1177/2378023119849803. [PMID: 37309413 PMCID: PMC10260256 DOI: 10.1177/2378023119849803] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Wagholikar KB, Dessai P, Sanz J, Mendis ME, Bell DS, Murphy SN. Implementation of informatics for integrating biology and the bedside (i2b2) platform as Docker containers. BMC Med Inform Decis Mak 2018;18:66. [PMID: 30012140 PMCID: PMC6048900 DOI: 10.1186/s12911-018-0646-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Accepted: 06/27/2018] [Indexed: 12/03/2022] Open

Mittal V, Hung LH, Keswani J, Kristiyanto D, Lee SB, Yeung KY. GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software. Gigascience 2018;6:1-6. [PMID: 28327936 PMCID: PMC5530313 DOI: 10.1093/gigascience/giw013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Accepted: 12/16/2016] [Indexed: 11/30/2022] Open

Almugbel R, Hung LH, Hu J, Almutairy A, Ortogero N, Tamta Y, Yeung KY. Reproducible Bioconductor workflows using browser-based interactive notebooks and containers. J Am Med Inform Assoc 2018;25:4-12. [PMID: 29092073 PMCID: PMC6381817 DOI: 10.1093/jamia/ocx120] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/31/2017] [Accepted: 09/28/2017] [Indexed: 11/14/2022] Open

Abstract

Objective

Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server.

Materials and methods

We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder.

Results

BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods.

Conclusion

Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.

Collapse

Kim B, Ali T, Lijeron C, Afgan E, Krampis K. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines. Gigascience 2017;6:1-7. [PMID: 28854616 PMCID: PMC5569920 DOI: 10.1093/gigascience/gix048] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 06/13/2017] [Accepted: 06/14/2017] [Indexed: 11/12/2022] Open

Costa RL, Gadelha L, Ribeiro-Alves M, Porto F. GeNNet: an integrated platform for unifying scientific workflows and graph databases for transcriptome data analysis. PeerJ 2017;5:e3509. [PMID: 28695067 PMCID: PMC5501156 DOI: 10.7717/peerj.3509] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Accepted: 06/06/2017] [Indexed: 12/28/2022] Open

Abstract

There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.

Collapse

Brohi RD, Wang L, Hassine NB, Cao J, Talpur HS, Wu D, Huang CJ, Rehman ZU, Bhattarai D, Huo LJ. Expression, Localization of SUMO-1, and Analyses of Potential SUMOylated Proteins in Bubalus bubalis Spermatozoa. Front Physiol 2017;8:354. [PMID: 28659810 PMCID: PMC5468435 DOI: 10.3389/fphys.2017.00354] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Accepted: 05/15/2017] [Indexed: 11/19/2022] Open

Affiliation(s)

Rahim Dad Brohi Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China
Li Wang Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China
Najla Ben Hassine Department of Biology, University of Evry Val D'essonneEvry, France
Jing Cao Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China
Hira Sajjad Talpur Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China
Di Wu Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China
Chun-Jie Huang Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China
Zia-Ur Rehman Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China
Dinesh Bhattarai Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China
Li-Jun Huo Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Education Ministry of China, College of Animal Science and Technology, Huazhong Agricultural UniversityWuhan, China.,Department of Hubei Province's Engineering Research Center in Buffalo Breeding and ProductsWuhan, China

Collapse

List M. Using Docker Compose for the Simple Deployment of an Integrated Drug Target Screening Platform. J Integr Bioinform 2017;14:/j/jib.ahead-of-print/jib-2017-0016/jib-2017-0016.xml. [PMID: 28600904 PMCID: PMC6042832 DOI: 10.1515/jib-2017-0016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 04/18/2017] [Indexed: 12/28/2022] Open

Schulz WL, Durant TJS, Siddon AJ, Torres R. Use of application containers and workflows for genomic data analysis. J Pathol Inform 2016;7:53. [PMID: 28163975 PMCID: PMC5248400 DOI: 10.4103/2153-3539.197197] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 11/27/2016] [Indexed: 11/29/2022] Open