Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Nagarajan N, Cook C, Di Bonaventura M, Ge H, Richards A, Bishop-Lilly KA, DeSalle R, Read TD, Pop M. Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics 2010;11:242. [PMID: 20398345 PMCID: PMC2864248 DOI: 10.1186/1471-2164-11-242] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 04/16/2010] [Indexed: 12/03/2022] Open

For:	Nagarajan N, Cook C, Di Bonaventura M, Ge H, Richards A, Bishop-Lilly KA, DeSalle R, Read TD, Pop M. Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics 2010;11:242. [PMID: 20398345 PMCID: PMC2864248 DOI: 10.1186/1471-2164-11-242] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 04/16/2010] [Indexed: 12/03/2022] Open

Number

Cited by Other Article(s)

Manual Annotation Studio (MAS): a collaborative platform for manual functional annotation of viral and microbial genomes. BMC Genomics 2021;22:733. [PMID: 34627149 PMCID: PMC8501643 DOI: 10.1186/s12864-021-08029-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 09/22/2021] [Indexed: 11/10/2022] Open

Shieh YK, Liu SC, Lung Lu C. Scaffolding Contigs Using Multiple Reference Genomes. Comput Biol Chem 2020. [DOI: 10.5772/intechopen.93456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Khachatryan L, de Leeuw RH, Kraakman MEM, Pappas N, Te Raa M, Mei H, de Knijff P, Laros JFJ. Taxonomic classification and abundance estimation using 16S and WGS-A comparison using controlled reference samples. Forensic Sci Int Genet 2020;46:102257. [PMID: 32058299 DOI: 10.1016/j.fsigen.2020.102257] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Revised: 12/30/2019] [Accepted: 01/27/2020] [Indexed: 12/30/2022]

Multiplexed Non-barcoded Long-Read Sequencing and Assembling Genomes of Bacillus Strains in Error-Free Simulations. Curr Microbiol 2019;77:79-84. [PMID: 31722044 DOI: 10.1007/s00284-019-01808-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 11/02/2019] [Indexed: 10/25/2022]

Waters NR, Abram F, Brennan F, Holmes A, Pritchard L. riboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions. Nucleic Acids Res 2019;46:e68. [PMID: 29608703 PMCID: PMC6009695 DOI: 10.1093/nar/gky212] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 03/12/2018] [Indexed: 11/12/2022] Open

Veras AADO, Merlin B, de Sá PHCG. ImproveAssembly - Tool for identifying new gene products and improving genome assembly. PLoS One 2018;13:e0206000. [PMID: 30365512 PMCID: PMC6203371 DOI: 10.1371/journal.pone.0206000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 10/04/2018] [Indexed: 11/18/2022] Open

Acuña-Amador L, Primot A, Cadieu E, Roulet A, Barloy-Hubler F. Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains. BMC Genomics 2018;19:54. [PMID: 29338683 PMCID: PMC5771137 DOI: 10.1186/s12864-017-4429-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/29/2017] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation.

RESULTS

We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains.

CONCLUSIONS

In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.

Collapse

Zhang Y, Kitajima M, Whittle AJ, Liu WT. Benefits of Genomic Insights and CRISPR-Cas Signatures to Monitor Potential Pathogens across Drinking Water Production and Distribution Systems. Front Microbiol 2017;8:2036. [PMID: 29097994 PMCID: PMC5654357 DOI: 10.3389/fmicb.2017.02036] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 10/05/2017] [Indexed: 11/22/2022] Open

Kremer FS, McBride AJA, Pinto LDS. Approaches for in silico finishing of microbial genome sequences. Genet Mol Biol 2017;40:553-576. [PMID: 28898352 PMCID: PMC5596377 DOI: 10.1590/1678-4685-gmb-2016-0230] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 03/13/2017] [Indexed: 12/15/2022] Open

Chen KT, Chen CJ, Shen HT, Liu CL, Huang SH, Lu CL. Multi-CAR: a tool of contig scaffolding using multiple references. BMC Bioinformatics 2016;17:469. [PMID: 28155633 PMCID: PMC5260120 DOI: 10.1186/s12859-016-1328-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

A draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown. To further obtain its complete sequence, a contig scaffolding process is usually applied to order and orient the contigs in the draft genome. Although several single reference-based scaffolding tools have been proposed, they may produce erroneous scaffolds if there are rearrangements between the target and reference genomes or their phylogenetic relationship is distant. This may suggest that a single reference genome may not be sufficient to produce correct scaffolds of a draft genome.

RESULTS

In this study, we design a simple heuristic method to further revise our single reference-based scaffolding tool CAR into a new one called Multi-CAR such that it can utilize multiple complete genomes of related organisms as references to more accurately order and orient the contigs of a draft genome. In practical usage, our Multi-CAR does not require prior knowledge concerning phylogenetic relationships among the draft and reference genomes and libraries of paired-end reads. To validate Multi-CAR, we have tested it on a real dataset composed of several prokaryotic genomes and also compared its accuracy performance with other multiple reference-based scaffolding tools Ragout and MeDuSa. Our experimental results have finally shown that Multi-CAR indeed outperforms Ragout and MeDuSa in terms of sensitivity, precision, genome coverage, scaffold number and scaffold N50 size.

CONCLUSIONS

Multi-CAR serves as an efficient tool that can more accurately order and orient the contigs of a draft genome based on multiple reference genomes. The web server of Multi-CAR is freely available at http://genome.cs.nthu.edu.tw/Multi-CAR/ .

Collapse

Choi SC. On the study of microbial transcriptomes using second- and third-generation sequencing technologies. J Microbiol 2016;54:527-36. [PMID: 27480632 DOI: 10.1007/s12275-016-6233-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 06/16/2016] [Accepted: 06/16/2016] [Indexed: 12/19/2022]

Smits SL, Bodewes R, Ruiz-González A, Baumgärtner W, Koopmans MP, Osterhaus ADME, Schürch AC. Recovering full-length viral genomes from metagenomes. Front Microbiol 2015;6:1069. [PMID: 26483782 PMCID: PMC4589665 DOI: 10.3389/fmicb.2015.01069] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 09/17/2015] [Indexed: 12/17/2022] Open

Farrant GK, Hoebeke M, Partensky F, Andres G, Corre E, Garczarek L. WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data. BMC Bioinformatics 2015;16:281. [PMID: 26335184 PMCID: PMC4559175 DOI: 10.1186/s12859-015-0705-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 08/17/2015] [Indexed: 01/12/2023] Open

Abstract

Background

The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding.

Results

Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome.

Conclusion

Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0705-y) contains supplementary material, which is available to authorized users.

Collapse

Eastman AW, Yuan ZC. Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing. Front Microbiol 2015;5:769. [PMID: 25653642 PMCID: PMC4301005 DOI: 10.3389/fmicb.2014.00769] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 12/16/2014] [Indexed: 01/10/2023] Open

Abstract

Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects.

Collapse

Forde BM, Ben Zakour NL, Stanton-Cook M, Phan MD, Totsika M, Peters KM, Chan KG, Schembri MA, Upton M, Beatson SA. The complete genome sequence of Escherichia coli EC958: a high quality reference sequence for the globally disseminated multidrug resistant E. coli O25b:H4-ST131 clone. PLoS One 2014;9:e104400. [PMID: 25126841 PMCID: PMC4134206 DOI: 10.1371/journal.pone.0104400] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Accepted: 07/11/2014] [Indexed: 11/18/2022] Open

Utturkar SM, Klingeman DM, Land ML, Schadt CW, Doktycz MJ, Pelletier DA, Brown SD. Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences. ACTA ACUST UNITED AC 2014;30:2709-16. [PMID: 24930142 PMCID: PMC4173024 DOI: 10.1093/bioinformatics/btu391] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Affiliation(s)

Sagar M Utturkar Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Dawn M Klingeman Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Miriam L Land Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Christopher W Schadt Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Mitchel J Doktycz Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Dale A Pelletier Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Steven D Brown Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA

Collapse

Lery LMS, Frangeul L, Tomas A, Passet V, Almeida AS, Bialek-Davenet S, Barbe V, Bengoechea JA, Sansonetti P, Brisse S, Tournebize R. Comparative analysis of Klebsiella pneumoniae genomes identifies a phospholipase D family protein as a novel virulence factor. BMC Biol 2014;12:41. [PMID: 24885329 PMCID: PMC4068068 DOI: 10.1186/1741-7007-12-41] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 05/15/2014] [Indexed: 12/17/2022] Open

Genome sequencing of Listeria monocytogenes. Methods Mol Biol 2014;1157:223-32. [PMID: 24792562 DOI: 10.1007/978-1-4939-0703-8_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]

Revolutionizing Prokaryotic Systematics Through Next-Generation Sequencing. J Microbiol Methods 2014. [DOI: 10.1016/bs.mim.2014.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

OMACC: an Optical-Map-Assisted Contig Connector for improving de novo genome assembly. BMC SYSTEMS BIOLOGY 2013;7 Suppl 6:S7. [PMID: 24564959 PMCID: PMC4029551 DOI: 10.1186/1752-0509-7-s6-s7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

BAIT: Organizing genomes and mapping rearrangements in single cells. Genome Med 2013;5:82. [PMID: 24028793 PMCID: PMC3971352 DOI: 10.1186/gm486] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 09/09/2013] [Indexed: 12/30/2022] Open

Kirkup BC, Mahlen S, Kallstrom G. Future-Generation Sequencing and Clinical Microbiology. Clin Lab Med 2013;33:685-704. [DOI: 10.1016/j.cll.2013.03.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD, Koren S, Pop M. De novo likelihood-based measures for comparing genome assemblies. BMC Res Notes 2013;6:334. [PMID: 23965294 PMCID: PMC3765854 DOI: 10.1186/1756-0500-6-334] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 08/13/2013] [Indexed: 12/12/2022] Open

Abstract

Background

The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments “read” by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. These “gold standards” can be expensive to produce and may only cover a small fraction of the genome, which limits their applicability to newly generated genome sequences. Here we introduce a de novo probabilistic measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics.

Results

We demonstrate that our de novo score can be computed quickly and accurately in a practical setting even for large datasets, by estimating the score from a relatively small sample of the reads. To demonstrate the benefits of our score, we measure the quality of the assemblies generated in the GAGE and Assemblathon 1 assembly “bake-offs” with our metric. Even without knowledge of the true reference sequence, our de novo metric closely matches the reference-based evaluation metrics used in the studies and outperforms other de novo metrics traditionally used to measure assembly quality (such as N50). Finally, we highlight the application of our score to optimize assembly parameters used in genome assemblers, which enables better assemblies to be produced, even without prior knowledge of the genome being assembled.

Conclusion

Likelihood-based measures, such as ours proposed here, will become the new standard for de novo assembly evaluation.

Collapse

Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy. Curr Opin Biotechnol 2013;24:690-8. [DOI: 10.1016/j.copbio.2013.01.009] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2012] [Revised: 01/20/2013] [Accepted: 01/22/2013] [Indexed: 12/25/2022]

Genome Sequencing of Four Strains of Rickettsia prowazekii, the Causative Agent of Epidemic Typhus, Including One Flying Squirrel Isolate. GENOME ANNOUNCEMENTS 2013;1:1/3/e00399-13. [PMID: 23814035 PMCID: PMC3695431 DOI: 10.1128/genomea.00399-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Tuteja R, Saxena RK, Davila J, Shah T, Chen W, Xiao YL, Fan G, Saxena KB, Alverson AJ, Spillane C, Town C, Varshney RK. Cytoplasmic male sterility-associated chimeric open reading frames identified by mitochondrial genome sequencing of four Cajanus genotypes. DNA Res 2013;20:485-95. [PMID: 23792890 PMCID: PMC3789559 DOI: 10.1093/dnares/dst025] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Enhanced de novo assembly of high throughput pyrosequencing data using whole genome mapping. PLoS One 2013;8:e61762. [PMID: 23613926 PMCID: PMC3629165 DOI: 10.1371/journal.pone.0061762] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 03/11/2013] [Indexed: 01/20/2023] Open

Edwards DJ, Holt KE. Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data. MICROBIAL INFORMATICS AND EXPERIMENTATION 2013;3:2. [PMID: 23575213 PMCID: PMC3630013 DOI: 10.1186/2042-5783-3-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 03/31/2013] [Indexed: 12/25/2022]

Kisand V, Lettieri T. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools. BMC Genomics 2013;14:211. [PMID: 23547799 PMCID: PMC3618134 DOI: 10.1186/1471-2164-14-211] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2012] [Accepted: 03/22/2013] [Indexed: 11/18/2022] Open

Abstract

Background

De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom.

Results

The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes.

Conclusions

Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort.

Collapse

Morgan XC, Huttenhower C. Chapter 12: Human microbiome analysis. PLoS Comput Biol 2012;8:e1002808. [PMID: 23300406 PMCID: PMC3531975 DOI: 10.1371/journal.pcbi.1002808] [Citation(s) in RCA: 310] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Sahlin K, Street N, Lundeberg J, Arvestad L. Improved gap size estimation for scaffolding algorithms. Bioinformatics 2012;28:2215-22. [DOI: 10.1093/bioinformatics/bts441] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Hurt RA, Brown SD, Podar M, Palumbo AV, Elias DA. Sequencing intractable DNA to close microbial genomes. PLoS One 2012;7:e41295. [PMID: 22859974 PMCID: PMC3409199 DOI: 10.1371/journal.pone.0041295] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Accepted: 06/19/2012] [Indexed: 11/18/2022] Open

Ricker N, Qian H, Fulthorpe RR. The limitations of draft assemblies for understanding prokaryotic adaptation and evolution. Genomics 2012;100:167-75. [PMID: 22750556 DOI: 10.1016/j.ygeno.2012.06.009] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Revised: 05/31/2012] [Accepted: 06/20/2012] [Indexed: 12/26/2022]

Barton MD, Barton HA. Scaffolder - software for manual genome scaffolding. SOURCE CODE FOR BIOLOGY AND MEDICINE 2012;7:4. [PMID: 22640820 PMCID: PMC3464138 DOI: 10.1186/1751-0473-7-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Accepted: 05/03/2012] [Indexed: 11/21/2022]

Gao S, Bertrand D, Nagarajan N. FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation. LECTURE NOTES IN COMPUTER SCIENCE 2012. [DOI: 10.1007/978-3-642-33122-0_25] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Bardaji L, Pérez-Martínez I, Rodríguez-Moreno L, Rodríguez-Palenzuela P, Sundin GW, Ramos C, Murillo J. Sequence and role in virulence of the three plasmid complement of the model tumor-inducing bacterium Pseudomonas savastanoi pv. savastanoi NCPPB 3335. PLoS One 2011;6:e25705. [PMID: 22022435 PMCID: PMC3191145 DOI: 10.1371/journal.pone.0025705] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 09/08/2011] [Indexed: 12/18/2022] Open

Abstract

Pseudomonas savastanoi pv. savastanoi NCPPB 3335 is a model for the study of the molecular basis of disease production and tumor formation in woody hosts, and its draft genome sequence has been recently obtained. Here we closed the sequence of the plasmid complement of this strain, composed of three circular molecules of 78,357 nt (pPsv48A), 45,220 nt (pPsv48B), and 42,103 nt (pPsv48C), all belonging to the pPT23A-like family of plasmids widely distributed in the P. syringae complex. A total of 152 coding sequences were predicted in the plasmid complement, of which 38 are hypothetical proteins and seven correspond to putative virulence genes. Plasmid pPsv48A contains an incomplete Type IVB secretion system, the type III secretion system (T3SS) effector gene hopAF1, gene ptz, involved in cytokinin biosynthesis, and three copies of a gene highly conserved in plant-associated proteobacteria, which is preceded by a hrp box motif. A complete Type IVA secretion system, a well conserved origin of transfer (oriT), and a homolog of the T3SS effector gene hopAO1 are present in pPsv48B, while pPsv48C contains a gene with significant homology to isopentenyl-diphosphate delta-isomerase, type 1. Several potential mobile elements were found on the three plasmids, including three types of MITE, a derivative of IS801, and a new transposon effector, ISPsy30. Although the replication regions of these three plasmids are phylogenetically closely related, their structure is diverse, suggesting that the plasmid architecture results from an active exchange of sequences. Artificial inoculations of olive plants with mutants cured of plasmids pPsv48A and pPsv48B showed that pPsv48A is necessary for full virulence and for the development of mature xylem vessels within the knots; we were unable to obtain mutants cured of pPsv48C, which contains five putative toxin-antitoxin genes.

Collapse

Hypervirulent Chlamydia trachomatis clinical strain is a recombinant between lymphogranuloma venereum (L(2)) and D lineages. mBio 2011;2:e00045-11. [PMID: 21540364 PMCID: PMC3088116 DOI: 10.1128/mbio.00045-11] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Abstract

Chlamydia trachomatis is an obligate intracellular bacterium that causes a diversity of severe and debilitating diseases worldwide. Sporadic and ongoing outbreaks of lymphogranuloma venereum (LGV) strains among men who have sex with men (MSM) support the need for research on virulence factors associated with these organisms. Previous analyses have been limited to single genes or genomes of laboratory-adapted reference strain L₂/434 and outbreak strain L₂b/UCH-1/proctitis. We characterized an unusual LGV strain, termed L₂c, isolated from an MSM with severe hemorrhagic proctitis. L₂c developed nonfusing, grape-like inclusions and a cytotoxic phenotype in culture, unlike the LGV strains described to date. Deep genome sequencing revealed that L₂c was a recombinant of L₂ and D strains with conserved clustered regions of genetic exchange, including a 78-kb region and a partial, yet functional, toxin gene that was lost with prolonged culture. Indels (insertions/deletions) were discovered in an ftsK gene promoter and in the tarp and hctB genes, which encode key proteins involved in replication, inclusion formation, and histone H1-like protein activity, respectively. Analyses suggest that these indels affect gene and/or protein function, supporting the in vitro and disease phenotypes. While recombination has been known to occur for C. trachomatis based on gene sequence analyses, we provide the first whole-genome evidence for recombination between a virulent, invasive LGV strain and a noninvasive common urogenital strain. Given the lack of a genetic system for producing stable C. trachomatis mutants, identifying naturally occurring recombinants can clarify gene function and provide opportunities for discovering avenues for genomic manipulation.

Lymphogranuloma venereum (LGV) is a prevalent and debilitating sexually transmitted disease in developing countries, although there are significant ongoing outbreaks in Australia, Europe, and the United States among men who have sex with men (MSM). Relatively little is known about LGV virulence factors, and only two LGV genomes have been sequenced to date. We isolated an LGV strain from an MSM with severe hemorrhagic proctitis that was morphologically unique in tissue culture compared with other LGV strains. Bioinformatic and statistical analyses identified the strain as a recombinant of L₂ and D strains with highly conserved clustered regions of genetic exchange. The unique culture morphology and, more importantly, disease phenotype could be traced to the genes involved in recombination. The findings have implications for bacterial species evolution and, in the case of ongoing LGV outbreaks, suggest that recombination is a mechanism for strain emergence that results in significant disease pathology.

Collapse