Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Liu Y, Schröder J, Schmidt B. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. ACTA ACUST UNITED AC 2012. [PMID: 23202746 DOI: 10.1093/bioinformatics/bts690] [Citation(s) in RCA: 180] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

For:	Liu Y, Schröder J, Schmidt B. Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data. ACTA ACUST UNITED AC 2012. [PMID: 23202746 DOI: 10.1093/bioinformatics/bts690] [Citation(s) in RCA: 180] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Number

Cited by Other Article(s)

151

Horn F, Linde J, Mattern DJ, Walther G, Guthke R, Brakhage AA, Valiante V. Draft Genome Sequence of the Fungus Penicillium brasilianum MG11. GENOME ANNOUNCEMENTS 2015;3:e00724-15. [PMID: 26337871 PMCID: PMC4559720 DOI: 10.1128/genomea.00724-15] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 07/24/2015] [Indexed: 02/02/2023]

152

Allam A, Kalnis P, Solovyev V. Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data. Bioinformatics 2015;31:3421-8. [DOI: 10.1093/bioinformatics/btv415] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 07/08/2015] [Indexed: 11/12/2022] Open

153

Song L, Florea L, Langmead B. Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol 2015;15:509. [PMID: 25398208 PMCID: PMC4248469 DOI: 10.1186/s13059-014-0509-9] [Citation(s) in RCA: 150] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2014] [Indexed: 02/02/2023] Open

154

Marçais G, Yorke JA, Zimin A. QuorUM: An Error Corrector for Illumina Reads. PLoS One 2015;10:e0130821. [PMID: 26083032 PMCID: PMC4471408 DOI: 10.1371/journal.pone.0130821] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 05/26/2015] [Indexed: 11/18/2022] Open

Abstract

Motivation

Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous k-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term “error correction” to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous k-mers in the output reads and preserving the most true k-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available.

Results

We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.

Availability

QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at http://www.genome.umd.edu.

Contact

gmarcais@umd.edu.

Collapse

155

Sahl JW, Schupp JM, Rasko DA, Colman RE, Foster JT, Keim P. Phylogenetically typing bacterial strains from partial SNP genotypes observed from direct sequencing of clinical specimen metagenomic data. Genome Med 2015;7:52. [PMID: 26136847 PMCID: PMC4487561 DOI: 10.1186/s13073-015-0176-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 05/15/2015] [Indexed: 12/30/2022] Open

156

Sheikhizadeh S, de Ridder D. ACE: accurate correction of errors usingK-mer tries. Bioinformatics 2015;31:3216-8. [DOI: 10.1093/bioinformatics/btv332] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Accepted: 05/22/2015] [Indexed: 11/13/2022] Open

157

Li H. BFC: correcting Illumina sequencing errors. Bioinformatics 2015;31:2885-7. [PMID: 25953801 DOI: 10.1093/bioinformatics/btv290] [Citation(s) in RCA: 120] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 05/02/2015] [Indexed: 11/12/2022] Open

158

Insights from the metagenome of an acid salt lake: the role of biology in an extreme depositional environment. PLoS One 2015;10:e0122869. [PMID: 25923206 PMCID: PMC4414474 DOI: 10.1371/journal.pone.0122869] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 02/24/2015] [Indexed: 12/31/2022] Open

159

Horn F, Üzüm Z, Möbius N, Guthke R, Linde J, Hertweck C. Draft Genome Sequences of Symbiotic and Nonsymbiotic Rhizopus microsporus Strains CBS 344.29 and ATCC 62417. GENOME ANNOUNCEMENTS 2015;3:e01370-14. [PMID: 25614557 PMCID: PMC4319578 DOI: 10.1128/genomea.01370-14] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 12/09/2014] [Indexed: 12/13/2022]

160

Horn F, Habel A, Scharf DH, Dworschak J, Brakhage AA, Guthke R, Hertweck C, Linde J. Draft Genome Sequence and Gene Annotation of the Entomopathogenic Fungus Verticillium hemipterigenum. GENOME ANNOUNCEMENTS 2015;3:e01439-14. [PMID: 25614560 PMCID: PMC4319583 DOI: 10.1128/genomea.01439-14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 12/09/2014] [Indexed: 11/20/2022]

161

Marinier E, Brown DG, McConkey BJ. Pollux: platform independent error correction of single and mixed genomes. BMC Bioinformatics 2015;16:10. [PMID: 25592313 PMCID: PMC4307147 DOI: 10.1186/s12859-014-0435-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Accepted: 12/17/2014] [Indexed: 12/13/2022] Open

162

Addisalem AB, Esselink GD, Bongers F, Smulders MJM. Genomic sequencing and microsatellite marker development for Boswellia papyrifera, an economically important but threatened tree native to dry tropical forests. AOB PLANTS 2015;7:plu086. [PMID: 25573702 PMCID: PMC4433549 DOI: 10.1093/aobpla/plu086] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 12/08/2014] [Indexed: 06/01/2023]

163

Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism. COMPUTATION 2014. [DOI: 10.3390/computation2040221] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

164

Melsted P, Halldórsson BV. KmerStream: streaming algorithms for k-mer abundance estimation. ACTA ACUST UNITED AC 2014;30:3541-7. [PMID: 25355787 DOI: 10.1093/bioinformatics/btu713] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

165

Molnar M, Ilie L. Correcting Illumina data. Brief Bioinform 2014;16:588-99. [PMID: 25183248 DOI: 10.1093/bib/bbu029] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Accepted: 08/02/2014] [Indexed: 11/12/2022] Open

166

Lim EC, Müller J, Hagmann J, Henz SR, Kim ST, Weigel D. Trowel: a fast and accurate error correction module for Illumina sequencing reads. ACTA ACUST UNITED AC 2014;30:3264-5. [PMID: 25075116 DOI: 10.1093/bioinformatics/btu513] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

167

Drezen E, Rizk G, Chikhi R, Deltel C, Lemaitre C, Peterlongo P, Lavenier D. GATB: Genome Assembly & Analysis Tool Box. ACTA ACUST UNITED AC 2014;30:2959-61. [PMID: 24990603 PMCID: PMC4184257 DOI: 10.1093/bioinformatics/btu406] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

168

Janin L, Schulz-Trieglaff O, Cox AJ. BEETL-fastq: a searchable compressed archive for DNA reads. Bioinformatics 2014;30:2796-801. [PMID: 24950811 DOI: 10.1093/bioinformatics/btu387] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

169

Knief C. Analysis of plant microbe interactions in the era of next generation sequencing technologies. FRONTIERS IN PLANT SCIENCE 2014;5:216. [PMID: 24904612 PMCID: PMC4033234 DOI: 10.3389/fpls.2014.00216] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 04/30/2014] [Indexed: 05/18/2023]

170

Wirawan A, Harris RS, Liu Y, Schmidt B, Schröder J. HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data. BMC Bioinformatics 2014;15:131. [PMID: 24885381 PMCID: PMC4023493 DOI: 10.1186/1471-2105-15-131] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 04/24/2014] [Indexed: 01/29/2023] Open

171

Yu YW, Yorukoglu D, Berger B. Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2014;8394:385-399. [PMID: 28825060 DOI: 10.1007/978-3-319-05269-4_31] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

172

Zhou X, Rokas A. Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. Mol Ecol 2014;23:1679-700. [DOI: 10.1111/mec.12680] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Revised: 01/17/2014] [Accepted: 01/22/2014] [Indexed: 12/17/2022]

173

Genomic sequence and experimental tractability of a new decapod shrimp model, Neocaridina denticulata. Mar Drugs 2014;12:1419-37. [PMID: 24619275 PMCID: PMC3967219 DOI: 10.3390/md12031419] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 02/23/2014] [Accepted: 02/28/2014] [Indexed: 12/14/2022] Open

174

Roy RS, Bhattacharya D, Schliep A. Turtle: Identifying frequent k -mers with cache-efficient algorithms. Bioinformatics 2014;30:1950-7. [DOI: 10.1093/bioinformatics/btu132] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

175

Heo Y, Wu XL, Chen D, Ma J, Hwu WM. BLESS: bloom filter-based error correction solution for high-throughput sequencing reads. ACTA ACUST UNITED AC 2014;30:1354-62. [PMID: 24451628 DOI: 10.1093/bioinformatics/btu030] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

176

Li W, Freudenberg J, Miramontes P. Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome. BMC Bioinformatics 2014;15:2. [PMID: 24386976 PMCID: PMC3927684 DOI: 10.1186/1471-2105-15-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2013] [Accepted: 12/17/2013] [Indexed: 11/10/2022] Open

Abstract

Background

The amount of non-unique sequence (non-singletons) in a genome directly affects the difficulty of read alignment to a reference assembly for high throughput-sequencing data. Although a longer read is more likely to be uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking. To address this question, we evaluate the k-mer distribution of the human reference genome. The k-mer frequency is determined for k ranging from 20 bp to 1000 bp.

Results

We observe that the proportion of non-singletons k-mers decreases slowly with increasing k, and can be fitted by piecewise power-law functions with different exponents at different ranges of k. A slower decay at greater values for k indicates more limited gains in mappability for read lengths between 200 bp and 1000 bp. The frequency distributions of k-mers exhibit long tails with a power-law-like trend, and rank frequency plots exhibit a concave Zipf’s curve. The most frequent 1000-mers comprise 172 regions, which include four large stretches on chromosomes 1 and X, containing genes of biomedical relevance. Comparison with other databases indicates that the 172 regions can be broadly classified into two types: those containing LINE transposable elements and those containing segmental duplications.

Conclusion

Read mappability as measured by the proportion of singletons increases steadily up to the length scale around 200 bp. When read length increases above 200 bp, smaller gains in mappability are expected. Moreover, the proportion of non-singletons decreases with read lengths much slower than linear. Even a read length of 1000 bp would not allow the unique alignment of reads for many coding regions of human genes. A mix of techniques will be needed for efficiently producing high-quality data that cover the complete human genome.

Collapse

177

Anvar SY, Khachatryan L, Vermaat M, van Galen M, Pulyakhina I, Ariyurek Y, Kraaijeveld K, den Dunnen JT, de Knijff P, ’t Hoen PAC, Laros JFJ. Determining the quality and complexity of next-generation sequencing data without a reference genome. Genome Biol 2014;15:555. [PMID: 25514851 PMCID: PMC4298064 DOI: 10.1186/s13059-014-0555-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 11/27/2014] [Indexed: 01/22/2023] Open

178

El-Metwally S, Ouda OM, Helmy M. Approaches and Challenges of Next-Generation Sequence Assembly Stages. NEXT GENERATION SEQUENCING TECHNOLOGIES AND CHALLENGES IN SEQUENCE ASSEMBLY 2014. [DOI: 10.1007/978-1-4939-0715-1_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

179

Rødland EA. Compact representation of k-mer de Bruijn graphs for genome read assembly. BMC Bioinformatics 2013;14:313. [PMID: 24152242 PMCID: PMC4015147 DOI: 10.1186/1471-2105-14-313] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 10/14/2013] [Indexed: 11/10/2022] Open

180

Guo Y, Ye F, Sheng Q, Clark T, Samuels DC. Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform 2013;15:879-89. [PMID: 24067931 DOI: 10.1093/bib/bbt069] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open