Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Karczewski KJ, Fernald GH, Martin AR, Snyder M, Tatonetti NP, Dudley JT. STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud. PLoS One 2014;9:e84860. [PMID: 24454756 PMCID: PMC3893165 DOI: 10.1371/journal.pone.0084860] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/27/2013] [Indexed: 12/30/2022] Open

For:	Karczewski KJ, Fernald GH, Martin AR, Snyder M, Tatonetti NP, Dudley JT. STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud. PLoS One 2014;9:e84860. [PMID: 24454756 PMCID: PMC3893165 DOI: 10.1371/journal.pone.0084860] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/27/2013] [Indexed: 12/30/2022] Open

Number

Cited by Other Article(s)

Cullen JN, Friedenberg SG. Whole Animal Genome Sequencing: user-friendly, rapid, containerized pipelines for processing, variant discovery, and annotation of short-read whole genome sequencing data. G3 (BETHESDA, MD.) 2023;13:jkad117. [PMID: 37243692 PMCID: PMC10411559 DOI: 10.1093/g3journal/jkad117] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 02/24/2023] [Accepted: 05/20/2023] [Indexed: 05/29/2023]

Ahmed Z, Renart EG, Mishra D, Zeeshan S. JWES: a new pipeline for whole genome/exome sequence data processing, management, and gene-variant discovery, annotation, prediction, and genotyping. FEBS Open Bio 2021;11:2441-2452. [PMID: 34370400 PMCID: PMC8409305 DOI: 10.1002/2211-5463.13261] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 01/07/2023] Open

Abstract

Whole genome and exome sequencing (WGS/WES) are the most popular next‐generation sequencing (NGS) methodologies and are at present often used to detect rare and common genetic variants of clinical significance. We emphasize that automated sequence data processing, management, and visualization should be an indispensable component of modern WGS and WES data analysis for sequence assembly, variant detection (SNPs, SVs), imputation, and resolution of haplotypes. In this manuscript, we present a newly developed findable, accessible, interoperable, and reusable (FAIR) bioinformatics‐genomics pipeline Java based Whole Genome/Exome Sequence Data Processing Pipeline (JWES) for efficient variant discovery and interpretation, and big data modeling and visualization. JWES is a cross‐platform, user‐friendly, product line application, that entails three modules: (a) data processing, (b) storage, and (c) visualization. The data processing module performs a series of different tasks for variant calling, the data storage module efficiently manages high‐volume gene‐variant data, and the data visualization module supports variant data interpretation with Circos graphs. The performance of JWES was tested and validated in‐house with different experiments, using Microsoft Windows, macOS Big Sur, and UNIX operating systems. JWES is an open‐source and freely available pipeline, allowing scientists to take full advantage of all the computing resources available, without requiring much computer science knowledge. We have successfully applied JWES for processing, management, and gene‐variant discovery, annotation, prediction, and genotyping of WGS and WES data to analyze variable complex disorders. In summary, we report the performance of JWES with some reproducible case studies, using open access and in‐house generated, high‐quality datasets.

Collapse

Ahmed Z, Renart EG, Zeeshan S. Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping. PeerJ 2021;9:e11724. [PMID: 34395068 PMCID: PMC8320519 DOI: 10.7717/peerj.11724] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/14/2021] [Indexed: 12/12/2022] Open

Musacchia F, Ciolfi A, Mutarelli M, Bruselles A, Castello R, Pinelli M, Basu S, Banfi S, Casari G, Tartaglia M, Nigro V. VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database. BMC Bioinformatics 2018;19:477. [PMID: 30541431 PMCID: PMC6291943 DOI: 10.1186/s12859-018-2532-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Accepted: 11/21/2018] [Indexed: 02/06/2023] Open

Abstract

BACKGROUND

Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the increasing number of analyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to simultaneously execute multiple analyses, organize, and categorize the samples is still missing.

RESULTS

Here we describe VarGenius, a Linux based command line software able to execute customizable pipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also perform the "joint analysis" of hundreds of samples with a single command, drastically reducing the time for the configuration and execution of the analysis. VarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal variant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results through a web page. VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this configuration, a 50 M whole exome sequencing (WES) analysis for a family was executed in about 7 h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1 M panel in about 2 h.

CONCLUSIONS

We developed VarGenius, a "master" tool that faces the increasing demand of heterogeneous NGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind of analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the database and any output file can be accessed programmatically. VarGenius can be used for routine analyses by biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to develop their own algorithms for the comparison and analysis of data. The software is freely available at: https://github.com/frankMusacchia/VarGenius.

Collapse

Pan C, McInnes G, Deflaux N, Snyder M, Bingham J, Datta S, Tsao PS. Cloud-based interactive analytics for terabytes of genomic variants data. Bioinformatics 2017;33:3709-3715. [PMID: 28961771 PMCID: PMC5860318 DOI: 10.1093/bioinformatics/btx468] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Revised: 06/30/2017] [Accepted: 07/25/2017] [Indexed: 12/30/2022] Open

MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants. BMC Bioinformatics 2017;18:49. [PMID: 28107819 PMCID: PMC5248509 DOI: 10.1186/s12859-016-1454-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 12/24/2016] [Indexed: 12/28/2022] Open

Abstract

Background

Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices.

Results

In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers.

Conclusions

MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.

Collapse

Hintzsche J, Kim J, Yadav V, Amato C, Robinson SE, Seelenfreund E, Shellman Y, Wisell J, Applegate A, McCarter M, Box N, Tentler J, De S, Robinson WA, Tan AC. IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples. J Am Med Inform Assoc 2016;23:721-30. [PMID: 27026619 DOI: 10.1093/jamia/ocw022] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 02/01/2016] [Indexed: 11/14/2022] Open

Abstract

OBJECTIVE

Currently, there is a disconnect between finding a patient's relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics.

METHODS AND MATERIALS

The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment.

RESULTS

IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies.

CONCLUSION

IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine.IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT.

Collapse

Affiliation(s)

Jennifer Hintzsche Division of Medical Oncology, Department of Medicine, School of Medicine
Jihye Kim Division of Medical Oncology, Department of Medicine, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Vinod Yadav Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, School of Medicine
Carol Amato Division of Medical Oncology, Department of Medicine, School of Medicine
Steven E Robinson Division of Medical Oncology, Department of Medicine, School of Medicine
Eric Seelenfreund Division of Medical Oncology, Department of Medicine, School of Medicine
Yiqun Shellman Department of Dermatology, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Joshua Wisell Department of Pathology, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Allison Applegate Division of Medical Oncology, Department of Medicine, School of Medicine
Martin McCarter Department of Surgery, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Neil Box Department of Dermatology, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
John Tentler Division of Medical Oncology, Department of Medicine, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Subhajyoti De Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, School of Medicine Department of Biostatistics and Informatics, Colorado School of Public Health University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
William A Robinson Division of Medical Oncology, Department of Medicine, School of Medicine University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Aik Choon Tan Division of Medical Oncology, Department of Medicine, School of Medicine Department of Biostatistics and Informatics, Colorado School of Public Health University of Colorado Cancer Center All: University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA

Collapse

Roy S, LaFramboise WA, Nikiforov YE, Nikiforova MN, Routbort MJ, Pfeifer J, Nagarajan R, Carter AB, Pantanowitz L. Next-Generation Sequencing Informatics: Challenges and Strategies for Implementation in a Clinical Environment. Arch Pathol Lab Med 2016;140:958-75. [PMID: 26901284 DOI: 10.5858/arpa.2015-0507-ra] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Vijay P, McIntyre AB, Mason CE, Greenfield JP, Li S. Clinical Genomics: Challenges and Opportunities. Crit Rev Eukaryot Gene Expr 2016;26:97-113. [PMID: 27480773 PMCID: PMC5470591 DOI: 10.1615/critreveukaryotgeneexpr.2016015724] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Calabrese B, Cannataro M. Bioinformatics and Microarray Data Analysis on the Cloud. Methods Mol Biol 2016;1375:25-39. [PMID: 25863787 DOI: 10.1007/7651_2015_236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Shameer K, Tripathi LP, Kalari KR, Dudley JT, Sowdhamini R. Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment. Brief Bioinform 2015;17:841-62. [PMID: 26494363 DOI: 10.1093/bib/bbv084] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Indexed: 12/20/2022] Open

Abstract

Accurate assessment of genetic variation in human DNA sequencing studies remains a nontrivial challenge in clinical genomics and genome informatics. Ascribing functional roles and/or clinical significances to single nucleotide variants identified from a next-generation sequencing study is an important step in genome interpretation. Experimental characterization of all the observed functional variants is yet impractical; thus, the prediction of functional and/or regulatory impacts of the various mutations using in silico approaches is an important step toward the identification of functionally significant or clinically actionable variants. The relationships between genotypes and the expressed phenotypes are multilayered and biologically complex; such relationships present numerous challenges and at the same time offer various opportunities for the design of in silico variant assessment strategies. Over the past decade, many bioinformatics algorithms have been developed to predict functional consequences of single nucleotide variants in the protein coding regions. In this review, we provide an overview of the bioinformatics resources for the prediction, annotation and visualization of coding single nucleotide variants. We discuss the currently available approaches and major challenges from the perspective of protein sequence, structure, function and interactions that require consideration when interpreting the impact of putatively functional variants. We also discuss the relevance of incorporating integrated workflows for predicting the biomedical impact of the functionally important variations encoded in a genome, exome or transcriptome. Finally, we propose a framework to classify variant assessment approaches and strategies for incorporation of variant assessment within electronic health records.

Collapse

Souilmi Y, Lancaster AK, Jung JY, Rizzo E, Hawkins JB, Powles R, Amzazi S, Ghazal H, Tonellato PJ, Wall DP. Scalable and cost-effective NGS genotyping in the cloud. BMC Med Genomics 2015;8:64. [PMID: 26470712 PMCID: PMC4608296 DOI: 10.1186/s12920-015-0134-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 09/11/2015] [Indexed: 12/20/2022] Open

Boulund F, Sjögren A, Kristiansson E. Tentacle: distributed quantification of genes in metagenomes. Gigascience 2015;4:40. [PMID: 26351566 PMCID: PMC4562114 DOI: 10.1186/s13742-015-0078-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 08/05/2015] [Indexed: 01/28/2023] Open

Shringarpure SS, Carroll A, De La Vega FM, Bustamante CD. Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes. PLoS One 2015;10:e0129277. [PMID: 26110529 PMCID: PMC4482534 DOI: 10.1371/journal.pone.0129277] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 05/06/2015] [Indexed: 01/22/2023] Open

Gao X, Xu J, Starmer J. Fastq2vcf: a concise and transparent pipeline for whole-exome sequencing data analyses. BMC Res Notes 2015;8:72. [PMID: 25889517 PMCID: PMC4376134 DOI: 10.1186/s13104-015-1027-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 02/23/2015] [Indexed: 12/26/2022] Open

Maji RK, Sarkar A, Khatua S, Dasgupta S, Ghosh Z. PVT: an efficient computational procedure to speed up next-generation sequence analysis. BMC Bioinformatics 2014;15:167. [PMID: 24894600 PMCID: PMC4063226 DOI: 10.1186/1471-2105-15-167] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Accepted: 05/07/2014] [Indexed: 12/05/2022] Open

Abstract

Background

High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently.

Results

We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time.

Conclusions

PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.

Collapse