Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Clement NL, Snell Q, Clement MJ, Hollenhorst PC, Purwar J, Graves BJ, Cairns BR, Johnson WE. The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. ACTA ACUST UNITED AC 2009;26:38-45. [PMID: 19861355 DOI: 10.1093/bioinformatics/btp614] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Clement NL, Snell Q, Clement MJ, Hollenhorst PC, Purwar J, Graves BJ, Cairns BR, Johnson WE. The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. ACTA ACUST UNITED AC 2009;26:38-45. [PMID: 19861355 DOI: 10.1093/bioinformatics/btp614] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

Becker D, Champredon D, Chato C, Gugan G, Poon A. SUP: a probabilistic framework to propagate genome sequence uncertainty, with applications. NAR Genom Bioinform 2023;5:lqad038. [PMID: 37101658 PMCID: PMC10124968 DOI: 10.1093/nargab/lqad038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 02/15/2023] [Accepted: 04/06/2023] [Indexed: 04/28/2023] Open

Purushothaman S, Meola M, Egli A. Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics. Int J Mol Sci 2022;23:9834. [PMID: 36077231 PMCID: PMC9456280 DOI: 10.3390/ijms23179834] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/24/2022] [Accepted: 08/26/2022] [Indexed: 12/21/2022] Open

Hidden Markov modeling for maximum probability neuron reconstruction. Commun Biol 2022;5:388. [PMID: 35468989 PMCID: PMC9038756 DOI: 10.1038/s42003-022-03320-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 03/24/2022] [Indexed: 11/08/2022] Open

Müller R, Nebel M. On the use of sequence-quality information in OTU clustering. PeerJ 2021;9:e11717. [PMID: 34458017 PMCID: PMC8375510 DOI: 10.7717/peerj.11717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 06/11/2021] [Indexed: 11/20/2022] Open

Abstract

Background

High-throughput sequencing has become an essential technology in life science research. Despite continuous improvements in technology, the produced sequences are still not entirely accurate. Consequently, the sequences are usually equipped with error probabilities. The quality information is already employed to find better solutions to a number of bioinformatics problems (e.g. read mapping). Data processing pipelines benefit in particular (especially when incorporating the quality information early), since enhanced outcomes of one step can improve all subsequent ones. Preprocessing steps, thus, quite regularly consider the sequence quality to fix errors or discard low-quality data. Other steps, however, like clustering sequences into operational taxonomic units (OTUs), a common task in the analysis of microbial communities, are typically performed without making use of the available quality information.

Results

In this paper, we present quality-aware clustering methods inspired by quality-weighted alignments and model-based denoising, and explore their applicability to OTU clustering. We implemented the quality-aware methods in a revised version of our de novo clustering tool GeFaST and evaluated their clustering quality and performance on mock-community data sets. Quality-weighted alignments were able to improve the clustering quality of GeFaST by up to 10%. The examination of the model-supported methods provided a more diverse picture, hinting at a narrower applicability, but they were able to attain similar improvements. Considering the quality information enlarged both runtime and memory consumption, even though the increase of the former depended heavily on the applied method and clustering threshold.

Conclusions

The quality-aware methods expand the iterative, de novo clustering approach by new clustering and cluster refinement methods. Our results indicate that OTU clustering constitutes yet another analysis step benefiting from the integration of quality information. Beyond the shown potential, the quality-aware methods offer a range of opportunities for fine-tuning and further extensions.

Collapse

Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021;22:249. [PMID: 34446078 PMCID: PMC8390189 DOI: 10.1186/s13059-021-02443-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 07/28/2021] [Indexed: 01/08/2023] Open

Affiliation(s)

Mohammed Alser Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
Jeremy Rotman Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
Dhrithi Deshpande Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA
Kodi Taraszka Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
Huwenbo Shi Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
Pelin Icer Baykal Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
Harry Taegyun Yang Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA Bioinformatics Interdepartmental Ph.D. Program, University of California Los Angeles, Los Angeles, CA, 90095, USA
Victor Xue Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
Sergey Knyazev Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
Benjamin D Singer Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA Department of Biochemistry & Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, USA Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
Brunilda Balliu Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
David Koslicki Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16801, USA Biology Department, Pennsylvania State University, University Park, PA, 16801, USA The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16801, USA
Pavel Skums Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
Alex Zelikovsky Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
Can Alkan Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey
Onur Mutlu Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
Serghei Mangul Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.

Collapse

Practical Workflow from High-Throughput Genotyping to Genomic Estimated Breeding Values (GEBVs). Methods Mol Biol 2021;2264:119-135. [PMID: 33263907 DOI: 10.1007/978-1-0716-1201-9_9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Anyansi C, Straub TJ, Manson AL, Earl AM, Abeel T. Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data. Front Microbiol 2020;11:1925. [PMID: 33013732 PMCID: PMC7507117 DOI: 10.3389/fmicb.2020.01925] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 07/22/2020] [Indexed: 01/17/2023] Open

Abstract

Metagenomic sequencing is a powerful tool for examining the diversity and complexity of microbial communities. Most widely used tools for taxonomic profiling of metagenomic sequence data allow for a species-level overview of the composition. However, individual strains within a species can differ greatly in key genotypic and phenotypic characteristics, such as drug resistance, virulence and growth rate. Therefore, the ability to resolve microbial communities down to the level of individual strains within a species is critical to interpreting metagenomic data for clinical and environmental applications, where identifying a particular strain, or tracking a particular strain across a set of samples, can help aid in clinical diagnosis and treatment, or in characterizing yet unstudied strains across novel environmental locations. Recently published approaches have begun to tackle the problem of resolving strains within a particular species in metagenomic samples. In this review, we present an overview of these new algorithms and their uses, including methods based on assembly reconstruction and methods operating with or without a reference database. While existing metagenomic analysis methods show reasonable performance at the species and higher taxonomic levels, identifying closely related strains within a species presents a bigger challenge, due to the diversity of databases, genetic relatedness, and goals when conducting these analyses. Selection of which metagenomic tool to employ for a specific application should be performed on a case-by case basis as these tools have strengths and weaknesses that affect their performance on specific tasks. A comprehensive benchmark across different use case scenarios is vital to validate performance of these tools on microbial samples. Because strain-level metagenomic analysis is still in its infancy, development of more fine-grained, high-resolution algorithms will continue to be in demand for the future.

Collapse

Sun S, Sha Z, Wang Y. The complete mitochondrial genomes of two vent squat lobsters, Munidopsis lauensis and M. verrilli: Novel gene arrangements and phylogenetic implications. Ecol Evol 2019;9:12390-12407. [PMID: 31788185 PMCID: PMC6875667 DOI: 10.1002/ece3.5542] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/31/2019] [Accepted: 07/19/2019] [Indexed: 12/14/2022] Open

Cho Y, Lee S, Hong JH, Kim BJ, Hong WY, Jung J, Lee HB, Sung J, Kim HN, Kim HL, Jung J. Development of the variant calling algorithm, ADIScan, and its use to estimate discordant sequences between monozygotic twins. Nucleic Acids Res 2019;46:e92. [PMID: 29873758 PMCID: PMC6125643 DOI: 10.1093/nar/gky445] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 05/15/2018] [Indexed: 12/30/2022] Open

Prousalis K, Konofaos N. Α Quantum Pattern Recognition Method for Improving Pairwise Sequence Alignment. Sci Rep 2019;9:7226. [PMID: 31076611 PMCID: PMC6510764 DOI: 10.1038/s41598-019-43697-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 04/29/2019] [Indexed: 12/22/2022] Open

Reich DP, Bass BL. Mapping the dsRNA World. Cold Spring Harb Perspect Biol 2019;11:11/3/a035352. [PMID: 30824577 DOI: 10.1101/cshperspect.a035352] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Evaluation of whole exome sequencing by targeted gene sequencing and Sanger sequencing. Clin Chim Acta 2017. [PMID: 28624499 DOI: 10.1016/j.cca.2017.06.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Brumme CJ, Poon AFY. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Res 2016;239:97-105. [PMID: 27993623 DOI: 10.1016/j.virusres.2016.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/15/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022]

Han Y, He X. Integrating Epigenomics into the Understanding of Biomedical Insight. Bioinform Biol Insights 2016;10:267-289. [PMID: 27980397 PMCID: PMC5138066 DOI: 10.4137/bbi.s38427] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Revised: 11/01/2016] [Accepted: 11/06/2016] [Indexed: 12/13/2022] Open

Ziemann M, Kaspi A, El-Osta A. Evaluation of microRNA alignment techniques. RNA (NEW YORK, N.Y.) 2016;22:1120-38. [PMID: 27284164 PMCID: PMC4931105 DOI: 10.1261/rna.055509.115] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 05/04/2016] [Indexed: 05/26/2023]

Kagale S, Koh C, Clarke WE, Bollina V, Parkin IAP, Sharpe AG. Analysis of Genotyping-by-Sequencing (GBS) Data. Methods Mol Biol 2016;1374:269-284. [PMID: 26519412 DOI: 10.1007/978-1-4939-3167-5_15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Ye H, Meehan J, Tong W, Hong H. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine. Pharmaceutics 2015;7:523-41. [PMID: 26610555 PMCID: PMC4695832 DOI: 10.3390/pharmaceutics7040523] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 11/14/2015] [Accepted: 11/17/2015] [Indexed: 02/06/2023] Open

Cui Y, Liao X, Zhu X, Wang B, Peng S. B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC. Interdiscip Sci 2015;8:28-34. [PMID: 26358141 DOI: 10.1007/s12539-015-0278-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Revised: 05/19/2014] [Accepted: 05/19/2014] [Indexed: 12/29/2022]

Whipple JM, Youssef OA, Aruscavage PJ, Nix DA, Hong C, Johnson WE, Bass BL. Genome-wide profiling of the C. elegans dsRNAome. RNA (NEW YORK, N.Y.) 2015;21:786-800. [PMID: 25805852 PMCID: PMC4408787 DOI: 10.1261/rna.048801.114] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Accepted: 12/23/2014] [Indexed: 06/01/2023]

Heterozygous genome assembly via binary classification of homologous sequence. BMC Bioinformatics 2015;16 Suppl 7:S5. [PMID: 25952609 PMCID: PMC4423727 DOI: 10.1186/1471-2105-16-s7-s5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

Genome assemblers to date have predominantly targeted haploid reference reconstruction from homozygous data. When applied to diploid genome assembly, these assemblers perform poorly, owing to the violation of assumptions during both the contigging and scaffolding phases. Effective tools to overcome these problems are in growing demand. Increasing parameter stringency during contigging is an effective solution to obtaining haplotype-specific contigs; however, effective algorithms for scaffolding such contigs are lacking.

Methods

We present a stand-alone scaffolding algorithm, ScaffoldScaffolder, designed specifically for scaffolding diploid genomes. The algorithm identifies homologous sequences as found in "bubble" structures in scaffold graphs. Machine learning classification is used to then classify sequences in partial bubbles as homologous or non-homologous sequences prior to reconstructing haplotype-specific scaffolds. We define four new metrics for assessing diploid scaffolding accuracy: contig sequencing depth, contig homogeneity, phase group homogeneity, and heterogeneity between phase groups.

Results

We demonstrate the viability of using bubbles to identify heterozygous homologous contigs, which we term homolotigs. We show that machine learning classification trained on these homolotig pairs can be used effectively for identifying homologous sequences elsewhere in the data with high precision (assuming error-free reads).

Conclusion

More work is required to comparatively analyze this approach on real data with various parameters and classifiers against other diploid genome assembly methods. However, the initial results of ScaffoldScaffolder supply validity to the idea of employing machine learning in the difficult task of diploid genome assembly. Software is available at http://bioresearch.byu.edu/scaffoldscaffolder.

Collapse

Zhang W, Soika V, Meehan J, Su Z, Ge W, Ng HW, Perkins R, Simonyan V, Tong W, Hong H. Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing. THE PHARMACOGENOMICS JOURNAL 2014;15:298-309. [PMID: 25384574 DOI: 10.1038/tpj.2014.70] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Revised: 07/16/2014] [Accepted: 09/19/2014] [Indexed: 12/18/2022]

Clement NL, Thompson LP, Miranker DP. ADaM: augmenting existing approximate fast matching algorithms with efficient and exact range queries. BMC Bioinformatics 2014;15 Suppl 7:S1. [PMID: 25079667 PMCID: PMC4110726 DOI: 10.1186/1471-2105-15-s7-s1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Validation and assessment of variant calling pipelines for next-generation sequencing. Hum Genomics 2014;8:14. [PMID: 25078893 PMCID: PMC4129436 DOI: 10.1186/1479-7364-8-14] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 07/15/2014] [Indexed: 02/07/2023] Open

Abstract

Background

The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal.

Results

We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable.

Conclusions

Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes.

Collapse

Evaluation and comparison of multiple aligners for next-generation sequencing data analysis. BIOMED RESEARCH INTERNATIONAL 2014;2014:309650. [PMID: 24779008 PMCID: PMC3980841 DOI: 10.1155/2014/309650] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 02/04/2014] [Indexed: 12/23/2022]

Qian X, Ba Y, Zhuang Q, Zhong G. RNA-Seq technology and its application in fish transcriptomics. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2013;18:98-110. [PMID: 24380445 DOI: 10.1089/omi.2013.0110] [Citation(s) in RCA: 196] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Hong C, Clement NL, Clement S, Hammoud SS, Carrell DT, Cairns BR, Snell Q, Clement MJ, Johnson WE. Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data. BMC Bioinformatics 2013;14:337. [PMID: 24261665 PMCID: PMC3924334 DOI: 10.1186/1471-2105-14-337] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/19/2013] [Indexed: 11/10/2022] Open

Multiplatform single-sample estimates of transcriptional activation. Proc Natl Acad Sci U S A 2013;110:17778-83. [PMID: 24128763 DOI: 10.1073/pnas.1305823110] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Francis OE, Bendall M, Manimaran S, Hong C, Clement NL, Castro-Nallar E, Snell Q, Schaalje GB, Clement MJ, Crandall KA, Johnson WE. Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome Res 2013;23:1721-9. [PMID: 23843222 PMCID: PMC3787268 DOI: 10.1101/gr.150151.112] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

O'Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, Bodily P, Tian L, Hakonarson H, Johnson WE, Wei Z, Wang K, Lyon GJ. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 2013;5:28. [PMID: 23537139 PMCID: PMC3706896 DOI: 10.1186/gm432] [Citation(s) in RCA: 307] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 03/23/2013] [Accepted: 03/27/2013] [Indexed: 12/18/2022] Open

Abstract

Background

To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.

Methods

We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage.

Results

SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family.

Conclusions

Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.

Collapse

Hong H, Zhang W, Shen J, Su Z, Ning B, Han T, Perkins R, Shi L, Tong W. Critical role of bioinformatics in translating huge amounts of next-generation sequencing data into personalized medicine. SCIENCE CHINA-LIFE SCIENCES 2013;56:110-8. [PMID: 23393026 DOI: 10.1007/s11427-013-4439-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Accepted: 11/29/2012] [Indexed: 01/12/2023]

Pérez-Losada M, Cabezas P, Castro-Nallar E, Crandall KA. Pathogen typing in the genomics era: MLST and the future of molecular epidemiology. INFECTION GENETICS AND EVOLUTION 2013;16:38-53. [PMID: 23357583 DOI: 10.1016/j.meegid.2013.01.009] [Citation(s) in RCA: 128] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Revised: 01/11/2013] [Accepted: 01/15/2013] [Indexed: 10/27/2022]

Lindner R, Friedel CC. A comprehensive evaluation of alignment algorithms in the context of RNA-seq. PLoS One 2012;7:e52403. [PMID: 23300661 PMCID: PMC3530550 DOI: 10.1371/journal.pone.0052403] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Accepted: 11/16/2012] [Indexed: 11/25/2022] Open

Abstract

Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the identification of splicing events. In recent years, a large number of such mapping algorithms have been developed, all of which have in common that they require algorithms for aligning a vast number of reads to genomic or transcriptomic sequences. Although the FM-index based aligner Bowtie has become a de facto standard within mapping pipelines, a much larger number of possible alignment algorithms have been developed also including other variants of FM-index based aligners. Accordingly, developers and users of RNA-seq mapping pipelines have the choice among a large number of available alignment algorithms. To provide guidance in the choice of alignment algorithms for these purposes, we evaluated the performance of 14 widely used alignment programs from three different algorithmic classes: algorithms using either hashing of the reference transcriptome, hashing of reads, or a compressed FM-index representation of the genome. Here, special emphasis was placed on both precision and recall and the performance for different read lengths and numbers of mismatches and indels in a read. Our results clearly showed the significant reduction in memory footprint and runtime provided by FM-index based aligners at a precision and recall comparable to the best hash table based aligners. Furthermore, the recently developed Bowtie 2 alignment algorithm shows a remarkable tolerance to both sequencing errors and indels, thus, essentially making hash-based aligners obsolete.

Collapse

Fonseca NA, Rung J, Brazma A, Marioni JC. Tools for mapping high-throughput sequencing data. Bioinformatics 2012;28:3169-77. [DOI: 10.1093/bioinformatics/bts605] [Citation(s) in RCA: 207] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Crook MB, Lindsay DP, Biggs MB, Bentley JS, Price JC, Clement SC, Clement MJ, Long SR, Griffitts JS. Rhizobial plasmids that cause impaired symbiotic nitrogen fixation and enhanced host invasion. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2012;25:1026-33. [PMID: 22746823 PMCID: PMC4406224 DOI: 10.1094/mpmi-02-12-0052-r] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

Warf MB, Shepherd BA, Johnson WE, Bass BL. Effects of ADARs on small RNA processing pathways in C. elegans. Genome Res 2012;22:1488-98. [PMID: 22673872 PMCID: PMC3409262 DOI: 10.1101/gr.134841.111] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2011] [Accepted: 04/02/2012] [Indexed: 11/24/2022]

Iorizzo M, Senalik D, Szklarczyk M, Grzebelus D, Spooner D, Simon P. De novo assembly of the carrot mitochondrial genome using next generation sequencing of whole genomic DNA provides first evidence of DNA transfer into an angiosperm plastid genome. BMC PLANT BIOLOGY 2012;12:61. [PMID: 22548759 PMCID: PMC3413510 DOI: 10.1186/1471-2229-12-61] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 05/01/2012] [Indexed: 05/02/2023]

Abstract

BACKGROUND

Sequence analysis of organelle genomes has revealed important aspects of plant cell evolution. The scope of this study was to develop an approach for de novo assembly of the carrot mitochondrial genome using next generation sequence data from total genomic DNA.

RESULTS

Sequencing data from a carrot 454 whole genome library were used to develop a de novo assembly of the mitochondrial genome. Development of a new bioinformatic tool allowed visualizing contig connections and elucidation of the de novo assembly. Southern hybridization demonstrated recombination across two large repeats. Genome annotation allowed identification of 44 protein coding genes, three rRNA and 17 tRNA. Identification of the plastid genome sequence allowed organelle genome comparison. Mitochondrial intergenic sequence analysis allowed detection of a fragment of DNA specific to the carrot plastid genome. PCR amplification and sequence analysis across different Apiaceae species revealed consistent conservation of this fragment in the mitochondrial genomes and an insertion in Daucus plastid genomes, giving evidence of a mitochondrial to plastid transfer of DNA. Sequence similarity with a retrotransposon element suggests a possibility that a transposon-like event transferred this sequence into the plastid genome.

CONCLUSIONS

This study confirmed that whole genome sequencing is a practical approach for de novo assembly of higher plant mitochondrial genomes. In addition, a new aspect of intercompartmental genome interaction was reported providing the first evidence for DNA transfer into an angiosperm plastid genome. The approach used here could be used more broadly to sequence and assemble mitochondrial genomes of diverse species. This information will allow us to better understand intercompartmental interactions and cell evolution.

Collapse

Pinto AC, Melo-Barbosa HP, Miyoshi A, Silva A, Azevedo V. Application of RNA-seq to reveal the transcript profile in bacteria. GENETICS AND MOLECULAR RESEARCH 2012;10:1707-18. [PMID: 21863565 DOI: 10.4238/vol10-3gmr1554] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Pellegrini M, Ferrari R. Epigenetic analysis: ChIP-chip and ChIP-seq. Methods Mol Biol 2012;802:377-387. [PMID: 22130894 DOI: 10.1007/978-1-61779-400-1_25] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Pelleymounter LL, Moon I, Johnson JA, Laederach A, Halvorsen M, Eckloff B, Abo R, Rossetti S. A novel application of pattern recognition for accurate SNP and indel discovery from high-throughput data: targeted resequencing of the glucocorticoid receptor co-chaperone FKBP5 in a Caucasian population. Mol Genet Metab 2011;104:457-69. [PMID: 21917492 PMCID: PMC3224211 DOI: 10.1016/j.ymgme.2011.08.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 08/18/2011] [Accepted: 08/18/2011] [Indexed: 11/28/2022]

Abstract

The detection of single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) with precision from high-throughput data remains a significant bioinformatics challenge. Accurate detection is necessary before next-generation sequencing can routinely be used in the clinic. In research, scientific advances are inhibited by gaps in data, exemplified by the underrepresented discovery of rare variants, variants in non-coding regions and indels. The continued presence of false positives and false negatives prevents full automation and requires additional manual verification steps. Our methodology presents applications of both pattern recognition and sensitivity analysis to eliminate false positives and aid in the detection of SNP/indel loci and genotypes from high-throughput data. We chose FK506-binding protein 51(FKBP5) (6p21.31) for our clinical target because of its role in modulating pharmacological responses to physiological and synthetic glucocorticoids and because of the complexity of the genomic region. We detected genetic variation across a 160 kb region encompassing FKBP5. 613 SNPs and 57 indels, including a 3.3 kb deletion were discovered. We validated our method using three independent data sets and, with Sanger sequencing and Affymetrix and Illumina microarrays, achieved 99% concordance. Furthermore we were able to detect 267 novel rare variants and assess linkage disequilibrium. Our results showed both a sensitivity and specificity of 98%, indicating near perfect classification between true and false variants. The process is scalable and amenable to automation, with the downstream filters taking only 1.5h to analyze 96 individuals simultaneously. We provide examples of how our level of precision uncovered the interactions of multiple loci, their predicted influences on mRNA stability, perturbations of the hsp90 binding site, and individual variation in FKBP5 expression. Finally we show how our discovery of rare variants may change current conceptions of evolution at this locus.

Collapse

Bybee SM, Bracken-Grissom HD, Hermansen RA, Clement MJ, Crandall KA, Felder DL. Directed next generation sequencing for phylogenetics: An example using Decapoda (Crustacea). ZOOL ANZ 2011. [DOI: 10.1016/j.jcz.2011.05.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Using VAAST to identify an X-linked disorder resulting in lethality in male infants due to N-terminal acetyltransferase deficiency. Am J Hum Genet 2011;89:28-43. [PMID: 21700266 DOI: 10.1016/j.ajhg.2011.05.017] [Citation(s) in RCA: 202] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2011] [Revised: 05/18/2011] [Accepted: 05/19/2011] [Indexed: 01/26/2023] Open

Lyon GJ, Jiang T, Van Wijk R, Wang W, Bodily PM, Xing J, Tian L, Robison RJ, Clement M, Lin Y, Zhang P, Liu Y, Moore B, Glessner JT, Elia J, Reimherr F, van Solinge WW, Yandell M, Hakonarson H, Wang J, Johnson WE, Wei Z, Wang K. Exome sequencing and unrelated findings in the context of complex disease research: ethical and clinical implications. DISCOVERY MEDICINE 2011;12:41-55. [PMID: 21794208 PMCID: PMC3544941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ. WITHDRAWN: Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 2011:jhg201162. [PMID: 21677664 DOI: 10.1038/jhg.2011.62] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 2011;56:406-14. [PMID: 21525877 DOI: 10.1038/jhg.2011.43] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Evan Johnson W, Welker NC, Bass BL. Dynamic linear model for the identification of miRNAs in next-generation sequencing data. Biometrics 2011;67:1206-14. [PMID: 21385162 DOI: 10.1111/j.1541-0420.2010.01570.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Clement NL, Clement MJ, Snell Q, Johnson WE. Parallel Mapping Approaches for GNUMAP. PROCEEDINGS. IPDPS (CONFERENCE) 2011;2011:435-443. [PMID: 23396612 DOI: 10.1109/ipdps.2011.184] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Oshlack A, Robinson MD, Young MD. From RNA-seq reads to differential expression results. Genome Biol 2010;11:220. [PMID: 21176179 PMCID: PMC3046478 DOI: 10.1186/gb-2010-11-12-220] [Citation(s) in RCA: 475] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 2010;11:473-83. [PMID: 20460430 DOI: 10.1093/bib/bbq015] [Citation(s) in RCA: 418] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open