Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Evani US, Challis D, Yu J, Jackson AR, Paithankar S, Bainbridge MN, Jakkamsetti A, Pham P, Coarfa C, Milosavljevic A, Yu F. Atlas2 Cloud: a framework for personal genome analysis in the cloud. BMC Genomics 2012;13 Suppl 6:S19. [PMID: 23134663 PMCID: PMC3481437 DOI: 10.1186/1471-2164-13-s6-s19] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

For:	Evani US, Challis D, Yu J, Jackson AR, Paithankar S, Bainbridge MN, Jakkamsetti A, Pham P, Coarfa C, Milosavljevic A, Yu F. Atlas2 Cloud: a framework for personal genome analysis in the cloud. BMC Genomics 2012;13 Suppl 6:S19. [PMID: 23134663 PMCID: PMC3481437 DOI: 10.1186/1471-2164-13-s6-s19] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Number

Cited by Other Article(s)

Copeland I, Wonkam-Tingang E, Gupta-Malhotra M, Hashmi SS, Han Y, Jajoo A, Hall NJ, Hernandez PP, Lie N, Liu D, Xu J, Rosenfeld J, Haldipur A, Desire Z, Coban-Akdemir ZH, Scott DA, Li Q, Chao HT, Zaske AM, Lupski JR, Milewicz DM, Shete S, Posey JE, Hanchard NA. Exome sequencing implicates ancestry-related Mendelian variation at SYNE1 in childhood-onset essential hypertension. JCI Insight 2024;9:e172152. [PMID: 38716726 PMCID: PMC11141928 DOI: 10.1172/jci.insight.172152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 03/19/2024] [Indexed: 05/12/2024] Open

Affiliation(s)

Ian Copeland Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
Edmond Wonkam-Tingang Childhood Complex Disease Genomics Section, National Human Genome Research Institute, NIH, Bethesda, USA
Monesha Gupta-Malhotra Department of Cardiology, Baylor College of Medicine, San Antonio, Texas, USA
S. Shahrukh Hashmi Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
Yixing Han Childhood Complex Disease Genomics Section, National Human Genome Research Institute, NIH, Bethesda, USA
Aarti Jajoo Childhood Complex Disease Genomics Section, National Human Genome Research Institute, NIH, Bethesda, USA
Nancy J. Hall Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA US Department of Agriculture Agricultural Research Service Children’s Nutrition Research Center, Baylor College of Medicine, Houston, Texas, USA
Paula P. Hernandez Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA US Department of Agriculture Agricultural Research Service Children’s Nutrition Research Center, Baylor College of Medicine, Houston, Texas, USA
Natasha Lie Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA Childhood Complex Disease Genomics Section, National Human Genome Research Institute, NIH, Bethesda, USA US Department of Agriculture Agricultural Research Service Children’s Nutrition Research Center, Baylor College of Medicine, Houston, Texas, USA
Dan Liu Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
Jun Xu Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
Jill Rosenfeld Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA Baylor Genetics, Houston, Texas, USA
Aparna Haldipur Childhood Complex Disease Genomics Section, National Human Genome Research Institute, NIH, Bethesda, USA
Zelene Desire Childhood Complex Disease Genomics Section, National Human Genome Research Institute, NIH, Bethesda, USA
Zeynep H. Coban-Akdemir Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, USA
Daryl A. Scott Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA Texas Children’s Hospital, Houston, Texas, USA Department of Molecular Physiology and Biophysics
Qing Li Childhood Complex Disease Genomics Section, National Human Genome Research Institute, NIH, Bethesda, USA
Hsiao-Tuan Chao Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA Division of Neurology and Developmental Neuroscience, Department of Pediatrics; and Department of Neuroscience, Baylor College of Medicine, Houston, Texas, USA Cain Pediatric Neurology Research Foundation Laboratories, Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital and Baylor College of Medicine, Houston, Texas, USA McNair Medical Institute, The Robert and Janice McNair Foundation, Houston, Texas, USA
Ana M. Zaske Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
James R. Lupski Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA Texas Children’s Hospital, Houston, Texas, USA Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
Dianna M. Milewicz Department of Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
Sanjay Shete The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
Jennifer E. Posey Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA McNair Medical Institute, The Robert and Janice McNair Foundation, Houston, Texas, USA
Neil A. Hanchard Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA Childhood Complex Disease Genomics Section, National Human Genome Research Institute, NIH, Bethesda, USA

Collapse

Ahmed Z, Renart EG, Mishra D, Zeeshan S. JWES: a new pipeline for whole genome/exome sequence data processing, management, and gene-variant discovery, annotation, prediction, and genotyping. FEBS Open Bio 2021;11:2441-2452. [PMID: 34370400 PMCID: PMC8409305 DOI: 10.1002/2211-5463.13261] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 01/07/2023] Open

Abstract

Whole genome and exome sequencing (WGS/WES) are the most popular next‐generation sequencing (NGS) methodologies and are at present often used to detect rare and common genetic variants of clinical significance. We emphasize that automated sequence data processing, management, and visualization should be an indispensable component of modern WGS and WES data analysis for sequence assembly, variant detection (SNPs, SVs), imputation, and resolution of haplotypes. In this manuscript, we present a newly developed findable, accessible, interoperable, and reusable (FAIR) bioinformatics‐genomics pipeline Java based Whole Genome/Exome Sequence Data Processing Pipeline (JWES) for efficient variant discovery and interpretation, and big data modeling and visualization. JWES is a cross‐platform, user‐friendly, product line application, that entails three modules: (a) data processing, (b) storage, and (c) visualization. The data processing module performs a series of different tasks for variant calling, the data storage module efficiently manages high‐volume gene‐variant data, and the data visualization module supports variant data interpretation with Circos graphs. The performance of JWES was tested and validated in‐house with different experiments, using Microsoft Windows, macOS Big Sur, and UNIX operating systems. JWES is an open‐source and freely available pipeline, allowing scientists to take full advantage of all the computing resources available, without requiring much computer science knowledge. We have successfully applied JWES for processing, management, and gene‐variant discovery, annotation, prediction, and genotyping of WGS and WES data to analyze variable complex disorders. In summary, we report the performance of JWES with some reproducible case studies, using open access and in‐house generated, high‐quality datasets.

Collapse

Ahmed Z, Renart EG, Zeeshan S. Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping. PeerJ 2021;9:e11724. [PMID: 34395068 PMCID: PMC8320519 DOI: 10.7717/peerj.11724] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/14/2021] [Indexed: 12/12/2022] Open

Bhardwaj A, Bag SK. PLANET-SNP pipeline: PLants based ANnotation and Establishment of True SNP pipeline. Genomics 2019;111:1066-1077. [PMID: 31533899 DOI: 10.1016/j.ygeno.2018.07.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 06/10/2018] [Accepted: 07/02/2018] [Indexed: 12/30/2022]

Rasnic R, Brandes N, Zuk O, Linial M. Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants. BMC Cancer 2019;19:783. [PMID: 31391007 PMCID: PMC6686424 DOI: 10.1186/s12885-019-5994-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Accepted: 07/30/2019] [Indexed: 02/07/2023] Open

Abstract

BACKGROUND

In recent years, research on cancer predisposition germline variants has emerged as a prominent field. The identity of somatic mutations is based on a reliable mapping of the patient germline variants. In addition, the statistics of germline variants frequencies in healthy individuals and cancer patients is the basis for seeking candidates for cancer predisposition genes. The Cancer Genome Atlas (TCGA) is one of the main sources of such data, providing a diverse collection of molecular data including deep sequencing for more than 30 types of cancer from > 10,000 patients.

METHODS

Our hypothesis in this study is that whole exome sequences from blood samples of cancer patients are not expected to show systematic differences among cancer types. To test this hypothesis, we analyzed common and rare germline variants across six cancer types, covering 2241 samples from TCGA. In our analysis we accounted for inherent variables in the data including the different variant calling protocols, sequencing platforms, and ethnicity.

RESULTS

We report on substantial batch effects in germline variants associated with cancer types. We attribute the effect to the specific sequencing centers that produced the data. Specifically, we measured 30% variability in the number of reported germline variants per sample across sequencing centers. The batch effect is further expressed in nucleotide composition and variant frequencies. Importantly, the batch effect causes substantial differences in germline variant distribution patterns across numerous genes, including prominent cancer predisposition genes such as BRCA1, RET, MAX, and KRAS. For most of known cancer predisposition genes, we found a distinct batch-dependent difference in germline variants.

CONCLUSION

TCGA germline data is exposed to strong batch effects with substantial variabilities among TCGA sequencing centers. We claim that those batch effects are consequential for numerous TCGA pan-cancer studies. In particular, these effects may compromise the reliability and the potency to detect new cancer predisposition genes. Furthermore, interpretation of pan-cancer analyses should be revisited in view of the source of the genomic data after accounting for the reported batch effects.

Collapse

Wang Y, Li G, Ma M, He F, Song Z, Zhang W, Wu C. GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service. BMC Genomics 2018;19:959. [PMID: 29363427 PMCID: PMC5780748 DOI: 10.1186/s12864-017-4334-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Whole-genome sequencing (WGS) plays an increasingly important role in clinical practice and public health. Due to the big data size, WGS data analysis is usually compute-intensive and IO-intensive. Currently it usually takes 30 to 40 h to finish a 50× WGS analysis task, which is far from the ideal speed required by the industry. Furthermore, the high-end infrastructure required by WGS computing is costly in terms of time and money. In this paper, we aim to improve the time efficiency of WGS analysis and minimize the cost by elastic cloud computing.

RESULTS

We developed a distributed system, GT-WGS, for large-scale WGS analyses utilizing the Amazon Web Services (AWS). Our system won the first prize on the Wind and Cloud challenge held by Genomics and Cloud Technology Alliance conference (GCTA) committee. The system makes full use of the dynamic pricing mechanism of AWS. We evaluate the performance of GT-WGS with a 55× WGS dataset (400GB fastq) provided by the GCTA 2017 competition. In the best case, it only took 18.4 min to finish the analysis and the AWS cost of the whole process is only 16.5 US dollars. The accuracy of GT-WGS is 99.9% consistent with that of the Genome Analysis Toolkit (GATK) best practice. We also evaluated the performance of GT-WGS performance on a real-world dataset provided by the XiangYa hospital, which consists of 5× whole-genome dataset with 500 samples, and on average GT-WGS managed to finish one 5× WGS analysis task in 2.4 min at a cost of $3.6.

CONCLUSIONS

WGS is already playing an important role in guiding therapeutic intervention. However, its application is limited by the time cost and computing cost. GT-WGS excelled as an efficient and affordable WGS analyses tool to address this problem. The demo video and supplementary materials of GT-WGS can be accessed at https://github.com/Genetalks/wgs_analysis_demo .

Collapse

Whole-exome sequencing and microRNA profiling reveal PI3K/AKT pathway’s involvement in juvenile myelomonocytic leukemia. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-017-0125-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Mashl RJ, Scott AD, Huang KL, Wyczalkowski MA, Yoon CJ, Niu B, DeNardo E, Yellapantula VD, Handsaker RE, Chen K, Koboldt DC, Ye K, Fenyö D, Raphael BJ, Wendl MC, Ding L. GenomeVIP: a cloud platform for genomic variant discovery and interpretation. Genome Res 2017;27:1450-1459. [PMID: 28522612 PMCID: PMC5538560 DOI: 10.1101/gr.211656.116] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 05/03/2017] [Indexed: 12/12/2022]

Affiliation(s)

R Jay Mashl McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA.,Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
Adam D Scott McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA.,Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
Kuan-Lin Huang McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA.,Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
Matthew A Wyczalkowski McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
Christopher J Yoon McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA.,Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
Beifang Niu McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
Erin DeNardo McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
Venkata D Yellapantula McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA.,Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
Robert E Handsaker Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Massachusetts 02142, USA.,Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
Ken Chen Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
Daniel C Koboldt McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
Kai Ye McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA.,Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
David Fenyö Langone Medical Center, New York University, New York, New York 10016, USA
Benjamin J Raphael Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912, USA
Michael C Wendl McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA.,Department of Genetics, Washington University, St. Louis, Missouri 63108, USA.,Department of Mathematics, Washington University, St. Louis, Missouri 63108, USA
Li Ding McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA.,Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA.,Department of Genetics, Washington University, St. Louis, Missouri 63108, USA.,Siteman Cancer Center, Washington University, St. Louis, Missouri 63108, USA

Collapse

He KY, Ge D, He MM. Big Data Analytics for Genomic Medicine. Int J Mol Sci 2017;18:ijms18020412. [PMID: 28212287 PMCID: PMC5343946 DOI: 10.3390/ijms18020412] [Citation(s) in RCA: 104] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 02/08/2017] [Accepted: 02/09/2017] [Indexed: 12/25/2022] Open

MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants. BMC Bioinformatics 2017;18:49. [PMID: 28107819 PMCID: PMC5248509 DOI: 10.1186/s12859-016-1454-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 12/24/2016] [Indexed: 12/28/2022] Open

Abstract

Background

Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices.

Results

In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers.

Conclusions

MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.

Collapse

Roy-Chowdhuri S, Roy S, Monaco SE, Routbort MJ, Pantanowitz L. Big data from small samples: Informatics of next-generation sequencing in cytopathology. Cancer Cytopathol 2016;125:236-244. [PMID: 27918649 DOI: 10.1002/cncy.21805] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Revised: 10/13/2016] [Accepted: 10/17/2016] [Indexed: 12/12/2022]

Tebani A, Afonso C, Marret S, Bekri S. Omics-Based Strategies in Precision Medicine: Toward a Paradigm Shift in Inborn Errors of Metabolism Investigations. Int J Mol Sci 2016;17:ijms17091555. [PMID: 27649151 PMCID: PMC5037827 DOI: 10.3390/ijms17091555] [Citation(s) in RCA: 105] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Revised: 09/06/2016] [Accepted: 09/07/2016] [Indexed: 12/20/2022] Open

Huang Z, Rustagi N, Veeraraghavan N, Carroll A, Gibbs R, Boerwinkle E, Venkata MG, Yu F. A hybrid computational strategy to address WGS variant analysis in >5000 samples. BMC Bioinformatics 2016;17:361. [PMID: 27612449 PMCID: PMC5018196 DOI: 10.1186/s12859-016-1211-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 08/25/2016] [Indexed: 11/22/2022] Open

Abstract

Background

The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies.

Results

We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms.

Conclusions

Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1211-6) contains supplementary material, which is available to authorized users.

Collapse

Leelananda SP, Kloczkowski A, Jernigan RL. Fold-specific sequence scoring improves protein sequence matching. BMC Bioinformatics 2016;17:328. [PMID: 27578239 PMCID: PMC5006591 DOI: 10.1186/s12859-016-1198-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 08/24/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Sequence matching is extremely important for applications throughout biology, particularly for discovering information such as functional and evolutionary relationships, and also for discriminating between unimportant and disease mutants. At present the functions of a large fraction of genes are unknown; improvements in sequence matching will improve gene annotations. Universal amino acid substitution matrices such as Blosum62 are used to measure sequence similarities and to identify distant homologues, regardless of the structure class. However, such single matrices do not take into account important structural information evident within the different topologies of proteins and treats substitutions within all protein folds identically. Others have suggested that the use of structural information can lead to significant improvements in sequence matching but this has not yet been very effective. Here we develop novel substitution matrices that include not only general sequence information but also have a topology specific component that is unique for each CATH topology. This novel feature of using a combination of sequence and structure information for each protein topology significantly improves the sequence matching scores for the sequence pairs tested. We have used a novel multi-structure alignment method for each homology level of CATH in order to extract topological information.

Results

We obtain statistically significant improved sequence matching scores for 73 % of the alpha helical test cases. On average, 61 % of the test cases showed improvements in homology detection when structure information was incorporated into the substitution matrices. On average z-scores for homology detection are improved by more than 54 % for all cases, and some individual cases have z-scores more than twice those obtained using generic matrices. Our topology specific similarity matrices also outperform other traditional similarity matrices and single matrix based structure methods. When default amino acid substitution matrix in the Psi-blast algorithm is replaced by our structure-based matrices, the structure matching is significantly improved over conventional Psi-blast. It also outperforms results obtained for the corresponding HMM profiles generated for each topology.

Conclusions

We show that by incorporating topology-specific structure information in addition to sequence information into specific amino acid substitution matrices, the sequence matching scores and homology detection are significantly improved. Our topology specific similarity matrices outperform other traditional similarity matrices, single matrix based structure methods, also show improvement over conventional Psi-blast and HMM profile based methods in sequence matching. The results support the discriminatory ability of the new amino acid similarity matrices to distinguish between distant homologs and structurally dissimilar pairs.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1198-z) contains supplementary material, which is available to authorized users.

Collapse

Menon R, Patel NV, Mohapatra A, Joshi CG. VDAP-GUI: a user-friendly pipeline for variant discovery and annotation of raw next-generation sequencing data. 3 Biotech 2016;6:68. [PMID: 28330138 PMCID: PMC4754298 DOI: 10.1007/s13205-016-0382-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 10/15/2015] [Indexed: 12/03/2022] Open

Roy S, LaFramboise WA, Nikiforov YE, Nikiforova MN, Routbort MJ, Pfeifer J, Nagarajan R, Carter AB, Pantanowitz L. Next-Generation Sequencing Informatics: Challenges and Strategies for Implementation in a Clinical Environment. Arch Pathol Lab Med 2016;140:958-75. [PMID: 26901284 DOI: 10.5858/arpa.2015-0507-ra] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Reisman S, Hatzopoulos T, Läufer K, Thiruvathukal GK, Putonti C. A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1. Evol Bioinform Online 2016;12:23-7. [PMID: 26819543 PMCID: PMC4718148 DOI: 10.4137/ebo.s32757] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2015] [Revised: 10/18/2015] [Accepted: 10/25/2015] [Indexed: 02/04/2023] Open

A Comprehensive Review of Emerging Computational Methods for Gene Identification. JOURNAL OF INFORMATION PROCESSING SYSTEMS 2016. [DOI: 10.3745/jips.04.0023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Kovatch P, Costa A, Giles Z, Fluder E, Cho HM, Mazurkova S. Big Omics Data Experience. SC ... CONFERENCE PROCEEDINGS. SC (CONFERENCE : SUPERCOMPUTING) 2015;2015. [PMID: 30788464 DOI: 10.1145/2807591.2807595] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Shringarpure SS, Carroll A, De La Vega FM, Bustamante CD. Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes. PLoS One 2015;10:e0129277. [PMID: 26110529 PMCID: PMC4482534 DOI: 10.1371/journal.pone.0129277] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 05/06/2015] [Indexed: 01/22/2023] Open

Ma J, Purcell H, Showalter L, Aagaard KM. Mitochondrial DNA sequence variation is largely conserved at birth with rare de novo mutations in neonates. Am J Obstet Gynecol 2015;212:530.e1-8. [PMID: 25687567 DOI: 10.1016/j.ajog.2015.02.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Revised: 01/29/2015] [Accepted: 02/09/2015] [Indexed: 12/21/2022]

Abstract

OBJECTIVE

Mitochondrial DNA (mtDNA) encodes the proteins of the electron transfer chain to produce adenosine triphosphate through oxidative phosphorylation, and is essential to sustain life. mtDNA is unique from the nuclear genome in so much as it is solely maternally inherited (non-mendelian patterning), and shows a relatively high rate of mutation due to the absence of error checking capacity. While it is generally assumed that most new mutations accumulate through the process of heteroplasmy, it is unknown whether mutations initiated in the mother are inherited, occur in utero, or occur and accumulate early in life. The purpose of this study is to examine the maternally heritable and de novo mutation rate in the fetal mtDNA through high-fidelity sequencing from a large population-based cohort.

STUDY DESIGN

Samples were obtained from 90 matched maternal (blood) and fetal (placental) pairs. In addition, a smaller cohort (n = 5) of maternal (blood), fetal (placental), and neonatal (cord blood) trios were subjected to DNA extraction and shotgun sequencing. The whole genome was sequenced on the Illumina HiSeq platform (Illumina Inc., San Diego, CA), and haplogroups and mtDNA variants were identified through mapping to reference mitochondrial genomes (NC_012920).

RESULTS

We observed 665 single nucleotide polymorphisms and 82 insertions-deletions variants identified in the cohort at large. We achieved high sequencing depth of the mtDNA to an average depth of 65X (range, 20-171X) coverage. The proportions of haplogroups identified in the cohort are consistent with the patient's self-identified ethnicity (>90% Hispanic), and all maternal-fetal pairs mapped to the identical haplogroup. Only variants from samples with average depth >20X and allele frequency >1% were included for further analysis. While the majority of the maternal-fetal pairs (>90%) demonstrated identical variants at the single nucleotide level, we observed rare mitochondrial single nucleotide polymorphism discordance between maternal and fetal mitochondrial genomes.

CONCLUSION

In this first in-depth sequencing analysis of mtDNA from maternal-fetal pairs at the time of birth, a low rate of de novo mutations appears in the fetal mitochondrial genome. This implies that these mutations likely arise from the maternal heteroplasmic pool (eg, in the oocyte), and accumulate later in the offspring's life. These findings have key implications for both the occurrence and screening for mitochondrial disorders.

Collapse

Kelly BJ, Fitch JR, Hu Y, Corsmeier DJ, Zhong H, Wetzel AN, Nordquist RD, Newsom DL, White P. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics. Genome Biol 2015;16:6. [PMID: 25600152 PMCID: PMC4333267 DOI: 10.1186/s13059-014-0577-x] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 12/23/2014] [Indexed: 12/18/2022] Open

Kumar P, Al-Shafai M, Al Muftah WA, Chalhoub N, Elsaid MF, Aleem AA, Suhre K. Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance. BMC Res Notes 2014;7:747. [PMID: 25339461 PMCID: PMC4216909 DOI: 10.1186/1756-0500-7-747] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 10/03/2014] [Indexed: 12/30/2022] Open

Madduri RK, Sulakhe D, Lacinski L, Liu B, Rodriguez A, Chard K, Dave UJ, Foster IT. Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services. CONCURRENCY AND COMPUTATION : PRACTICE & EXPERIENCE 2014;26:2266-2279. [PMID: 25342933 PMCID: PMC4203657 DOI: 10.1002/cpe.3274] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]

Shyr C, Kushniruk A, Wasserman WW. Usability study of clinical exome analysis software: top lessons learned and recommendations. J Biomed Inform 2014;51:129-36. [PMID: 24860971 DOI: 10.1016/j.jbi.2014.05.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2013] [Revised: 04/30/2014] [Accepted: 05/06/2014] [Indexed: 10/25/2022]

Abstract

OBJECTIVES

New DNA sequencing technologies have revolutionized the search for genetic disruptions. Targeted sequencing of all protein coding regions of the genome, called exome analysis, is actively used in research-oriented genetics clinics, with the transition to exomes as a standard procedure underway. This transition is challenging; identification of potentially causal mutation(s) amongst ∼10(6) variants requires specialized computation in combination with expert assessment. This study analyzes the usability of user interfaces for clinical exome analysis software. There are two study objectives: (1) To ascertain the key features of successful user interfaces for clinical exome analysis software based on the perspective of expert clinical geneticists, (2) To assess user-system interactions in order to reveal strengths and weaknesses of existing software, inform future design, and accelerate the clinical uptake of exome analysis.

METHODS

Surveys, interviews, and cognitive task analysis were performed for the assessment of two next-generation exome sequence analysis software packages. The subjects included ten clinical geneticists who interacted with the software packages using the "think aloud" method. Subjects' interactions with the software were recorded in their clinical office within an urban research and teaching hospital. All major user interface events (from the user interactions with the packages) were time-stamped and annotated with coding categories to identify usability issues in order to characterize desired features and deficiencies in the user experience.

RESULTS

We detected 193 usability issues, the majority of which concern interface layout and navigation, and the resolution of reports. Our study highlights gaps in specific software features typical within exome analysis. The clinicians perform best when the flow of the system is structured into well-defined yet customizable layers for incorporation within the clinical workflow. The results highlight opportunities to dramatically accelerate clinician analysis and interpretation of patient genomic data.

CONCLUSION

We present the first application of usability methods to evaluate software interfaces in the context of exome analysis. Our results highlight how the study of user responses can lead to identification of usability issues and challenges and reveal software reengineering opportunities for improving clinical next-generation sequencing analysis. While the evaluation focused on two distinctive software tools, the results are general and should inform active and future software development for genome analysis software. As large-scale genome analysis becomes increasingly common in healthcare, it is critical that efficient and effective software interfaces are provided to accelerate clinical adoption of the technology. Implications for improved design of such applications are discussed.

Collapse

Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies. ISRN BIOINFORMATICS 2013;2013:481545. [PMID: 25937948 PMCID: PMC4393068 DOI: 10.1155/2013/481545] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 08/07/2013] [Indexed: 01/31/2023]

Lin CF, Valladares O, Childress DM, Klevak E, Geller ET, Hwang YC, Tsai EA, Schellenberg GD, Wang LS. DRAW+SneakPeek: analysis workflow and quality metric management for DNA-seq experiments. Bioinformatics 2013;29:2498-500. [PMID: 23943636 PMCID: PMC3777113 DOI: 10.1093/bioinformatics/btt422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open