401
|
Gu D, Dong N, Zheng Z, Lin D, Huang M, Wang L, Chan EWC, Shu L, Yu J, Zhang R, Chen S. A fatal outbreak of ST11 carbapenem-resistant hypervirulent Klebsiella pneumoniae in a Chinese hospital: a molecular epidemiological study. THE LANCET. INFECTIOUS DISEASES 2017; 18:37-46. [PMID: 28864030 DOI: 10.1016/s1473-3099(17)30489-9] [Citation(s) in RCA: 692] [Impact Index Per Article: 86.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 07/09/2017] [Accepted: 07/18/2017] [Indexed: 01/02/2023]
Abstract
BACKGROUND Hypervirulent Klebsiella pneumoniae strains often cause life-threatening community-acquired infections in young and healthy hosts, but are usually sensitive to antibiotics. In this study, we investigated a fatal outbreak of ventilator-associated pneumonia caused by a new emerging hypervirulent K pneumoniae strain. METHODS The outbreak occurred in the integrated intensive care unit of a new branch of the Second Affiliated Hospital of Zhejiang University (Hangzhou, China). We collected 21 carbapenem-resistant K pneumoniae strains from five patients and characterised these strains for their antimicrobial susceptibility, multilocus sequence types, and genetic relatedness using VITEK-2 compact system, multilocus sequence typing, and whole genome sequencing. We selected one representative isolate from each patient to establish the virulence potential using a human neutrophil assay and Galleria mellonella model and to establish the genetic basis of their hypervirulence phenotype. FINDINGS All five patients had undergone surgery for multiple trauma and subsequently received mechanical ventilation. The patients were aged 53-73 years and were admitted to the intensive care unit between late February and April, 2016. They all had severe pneumonia, carbapenem-resistant K pneumoniae infections, and poor responses to antibiotic treatment and died due to severe lung infection, multiorgan failure, or septic shock. All five representative carbapenem-resistant K pneumoniae strains belonged to the ST11 type, which is the most prevalent carbapenem-resistant K pneumoniae type in China, and originated from the same clone. The strains were positive on the string test, had survival of about 80% after 1 h incubation in human neutrophils, and killed 100% of wax moth larvae (G mellonella) inoculated with 1 × 106 colony-forming units of the specimens within 24 h, suggesting that they were hypervirulent K pneumoniae. Genomic analyses showed that the emergence of these ST11 carbapenem-resistant hypervirulent K pneumoniae strains was due to the acquisition of a roughly 170 kbp pLVPK-like virulence plasmid by classic ST11 carbapenem-resistant K pneumoniae strains. We also detected these strains in specimens collected in other regions of China. INTERPRETATION The ST11 carbapenem-resistant hypervirulent K pneumoniae strains pose a substantial threat to human health because they are simultaneously hypervirulent, multidrug resistant, and highly transmissible. Control measures should be implemented to prevent further dissemination of such organisms in the hospital setting and the community. FUNDING Chinese National Key Basic Research and Development Program and Collaborative Research Fund of Hong Kong Research Grant Council.
Collapse
Affiliation(s)
- Danxia Gu
- Department of Clinical Laboratory Medicine, Second Affiliated Hospital of Zhejiang University, Hangzhou, China; Center for Cancer Biology and Innovative Therapeutics, Key Laboratory of Tumor Molecular Diagnosis and Individualized Medicine of Zhejiang Province, Clinical Research Institute, Zhejiang Provincial People's Hospital, Hangzhou, China
| | - Ning Dong
- Shenzhen Key Lab for Food Biological Safety Control, Food Safety and Technology Research Center, Hong Kong PolyU Shen Zhen Research Institute, Shenzhen, China; State Key Lab of Chirosciences, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, Hong Kong Special Administrative Region, China
| | - Zhiwei Zheng
- Shenzhen Key Lab for Food Biological Safety Control, Food Safety and Technology Research Center, Hong Kong PolyU Shen Zhen Research Institute, Shenzhen, China; State Key Lab of Chirosciences, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, Hong Kong Special Administrative Region, China
| | - Di Lin
- Department of Clinical Laboratory Medicine, Second Affiliated Hospital of Zhejiang University, Hangzhou, China
| | - Man Huang
- General Intensive Care Unit, Second Affiliated Hospital of Zhejiang University, Hangzhou, China
| | - Lihua Wang
- Department of Radiology, Second Affiliated Hospital of Zhejiang University, Hangzhou, China
| | - Edward Wai-Chi Chan
- Shenzhen Key Lab for Food Biological Safety Control, Food Safety and Technology Research Center, Hong Kong PolyU Shen Zhen Research Institute, Shenzhen, China; State Key Lab of Chirosciences, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, Hong Kong Special Administrative Region, China
| | - Lingbin Shu
- Department of Clinical Laboratory Medicine, Second Affiliated Hospital of Zhejiang University, Hangzhou, China
| | - Jiang Yu
- Department of Clinical Laboratory Medicine, Second Affiliated Hospital of Zhejiang University, Hangzhou, China
| | - Rong Zhang
- Department of Clinical Laboratory Medicine, Second Affiliated Hospital of Zhejiang University, Hangzhou, China.
| | - Sheng Chen
- Shenzhen Key Lab for Food Biological Safety Control, Food Safety and Technology Research Center, Hong Kong PolyU Shen Zhen Research Institute, Shenzhen, China; State Key Lab of Chirosciences, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, Hong Kong Special Administrative Region, China.
| |
Collapse
|
402
|
Parker J, Helmstetter AJ, Devey D, Wilkinson T, Papadopulos AST. Field-based species identification of closely-related plants using real-time nanopore sequencing. Sci Rep 2017; 7:8345. [PMID: 28827531 PMCID: PMC5566789 DOI: 10.1038/s41598-017-08461-5] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 07/12/2017] [Indexed: 01/04/2023] Open
Abstract
Advances in DNA sequencing and informatics have revolutionised biology over the past four decades, but technological limitations have left many applications unexplored. Recently, portable, real-time, nanopore sequencing (RTnS) has become available. This offers opportunities to rapidly collect and analyse genomic data anywhere. However, generation of datasets from large, complex genomes has been constrained to laboratories. The portability and long DNA sequences of RTnS offer great potential for field-based species identification, but the feasibility and accuracy of these technologies for this purpose have not been assessed. Here, we show that a field-based RTnS analysis of closely-related plant species (Arabidopsis spp.) has many advantages over laboratory-based high-throughput sequencing (HTS) methods for species level identification and phylogenomics. Samples were collected and sequenced in a single day by RTnS using a portable, “al fresco” laboratory. Our analyses demonstrate that correctly identifying unknown reads from matches to a reference database with RTnS reads enables rapid and confident species identification. Individually annotated RTnS reads can be used to infer the evolutionary relationships of A. thaliana. Furthermore, hybrid genome assembly with RTnS and HTS reads substantially improved upon a genome assembled from HTS reads alone. Field-based RTnS makes real-time, rapid specimen identification and genome wide analyses possible.
Collapse
Affiliation(s)
- Joe Parker
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, UK, TW9 3AB.
| | | | - Dion Devey
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, UK, TW9 3AB
| | - Tim Wilkinson
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, UK, TW9 3AB
| | - Alexander S T Papadopulos
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey, UK, TW9 3AB. .,Molecular Ecology and Fisheries Genetics Laboratory, Environment Centre Wales, School of Biological Sciences, Bangor University, Bangor, UK, LL57 2UW.
| |
Collapse
|
403
|
Miller JR, Zhou P, Mudge J, Gurtowski J, Lee H, Ramaraj T, Walenz BP, Liu J, Stupar RM, Denny R, Song L, Singh N, Maron LG, McCouch SR, McCombie WR, Schatz MC, Tiffin P, Young ND, Silverstein KAT. Hybrid assembly with long and short reads improves discovery of gene family expansions. BMC Genomics 2017; 18:541. [PMID: 28724409 PMCID: PMC5518131 DOI: 10.1186/s12864-017-3927-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 07/06/2017] [Indexed: 11/25/2022] Open
Abstract
Background Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation. Methods We developed a hybrid assembly pipeline called “Alpaca” that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation. Results Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies. Conclusion Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3927-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jason R Miller
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD, 20850, USA.
| | - Peng Zhou
- Department of Plant Biology, University of Minnesota, Saint Paul, MN, USA
| | - Joann Mudge
- National Center for Genome Resources, Santa Fe, NM, USA
| | | | - Hayan Lee
- Stanford School of Medicine, Stanford, CA, USA
| | | | - Brian P Walenz
- National Human Genome Research Institute, Bethesda, MD, USA
| | - Junqi Liu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA
| | - Robert M Stupar
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA
| | - Roxanne Denny
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, USA
| | - Li Song
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Namrata Singh
- School of Integrative Plant Sciences, Plant Breeding and Genetics section, Cornell University, Ithaca, NY, 14850, USA
| | - Lyza G Maron
- School of Integrative Plant Sciences, Plant Breeding and Genetics section, Cornell University, Ithaca, NY, 14850, USA
| | - Susan R McCouch
- School of Integrative Plant Sciences, Plant Breeding and Genetics section, Cornell University, Ithaca, NY, 14850, USA
| | | | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Peter Tiffin
- Department of Plant Biology, University of Minnesota, Saint Paul, MN, USA
| | - Nevin D Young
- Department of Plant Biology, University of Minnesota, Saint Paul, MN, USA
| | | |
Collapse
|
404
|
Abstract
Nanopore technology provides a novel approach to DNA sequencing that yields long, label-free reads of constant quality. The first commercial implementation of this approach, the MinION, has shown promise in various sequencing applications. This review gives an up-to-date overview of the MinION's utility as a de novo sequencing device. It is argued that the MinION may allow for portable and affordable de novo sequencing of even complex genomes in the near future, despite the currently error-prone nature of its reads. Through continuous updates to the MinION hardware and the development of new assembly pipelines, both sequencing accuracy and assembly quality have already risen rapidly. However, this fast pace of development has also lead to a lack of overview of the expanding landscape of analysis tools, as performance evaluations are outdated quickly. As the MinION is approaching a state of maturity, its user community would benefit from a thorough comparative benchmarking effort of de novo assembly pipelines in the near future. An earlier version of this article can be found on bioRxiv.
Collapse
Affiliation(s)
- Carlos de Lannoy
- Plant Sciences, Wageningen University & Research, Wageningen, 6700AP, Netherlands.,Faculty of Bioscience Engineering, KU Leuven, Leuven, 3001, Belgium
| | - Dick de Ridder
- Plant Sciences, Wageningen University & Research, Wageningen, 6700AP, Netherlands
| | - Judith Risse
- Plant Sciences, Wageningen University & Research, Wageningen, 6700AP, Netherlands
| |
Collapse
|
405
|
Complete Genome Sequence of the Olive-Infecting Strain Xylella fastidiosa subsp. pauca De Donno. GENOME ANNOUNCEMENTS 2017; 5:5/27/e00569-17. [PMID: 28684573 PMCID: PMC5502854 DOI: 10.1128/genomea.00569-17] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We report here the complete and annotated genome sequence of the plant-pathogenic bacterium Xylella fastidiosa subsp. pauca strain De Donno. This strain was recovered from an olive tree severely affected by olive quick decline syndrome (OQDS), a devastating olive disease associated with X. fastidiosa infections in susceptible olive cultivars.
Collapse
|
406
|
Abstract
Long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore MinION are capable of producing long sequencing reads with average fragment lengths of over 10,000 base-pairs and maximum lengths reaching 100,000 base- pairs. Compared with short reads, the assemblies obtained from long-read sequencing platforms have much higher contig continuity and genome completeness as long fragments are able to extend paths into problematic or repetitive regions. Many successful assembly applications of the Pacific Biosciences technology have been reported ranging from small bacterial genomes to large plant and animal genomes. Recently, genome assemblies using Oxford Nanopore MinION data have attracted much attention due to the portability and low cost of this novel sequencing instrument. In this paper, we re-sequenced a well characterized genome, the Saccharomyces cerevisiae S288C strain using three different platforms: MinION, PacBio and MiSeq. We present a comprehensive metric comparison of assemblies generated by various pipelines and discuss how the platform associated data characteristics affect the assembly quality. With a given read depth of 31X, the assemblies from both Pacific Biosciences and Oxford Nanopore MinION show excellent continuity and completeness for the 16 nuclear chromosomes, but not for the mitochondrial genome, whose reconstruction still represents a significant challenge.
Collapse
|
407
|
Galea CA, Han M, Zhu Y, Roberts K, Wang J, Thompson PE, L J, Velkov T. Characterization of the Polymyxin D Synthetase Biosynthetic Cluster and Product Profile of Paenibacillus polymyxa ATCC 10401. JOURNAL OF NATURAL PRODUCTS 2017; 80:1264-1274. [PMID: 28463513 DOI: 10.1021/acs.jnatprod.6b00807] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The increasing prevalence of polymyxin-resistant bacteria has stimulated the search for improved polymyxin lipopeptides. Here we describe the sequence and product profile for polymyxin D nonribosomal peptide synthetase from Paenibacillus polymyxa ATCC 10401. The polymyxin D synthase gene cluster comprised five genes that encoded ABC transporters (pmxC and pmxD) and enzymes responsible for the biosynthesis of polymyxin D (pmxA, pmxB, and pmxE). Unlike polymyxins B and E, polymyxin D contains d-Ser at position 3 as opposed to l-α,γ-diaminobutyric acid and has an l-Thr at position 7 rather than l-Leu. Module 3 of pmxE harbored an auxiliary epimerization domain that catalyzes the conversion of l-Ser to the d-form. Structural modeling suggested that the adenylation domains of module 3 in PmxE and modules 6 and 7 in PmxA could bind amino acids with larger side chains than their preferred substrate. Feeding individual amino acids into the culture media not only affected production of polymyxins D1 and D2 but also led to the incorporation of different amino acids at positions 3, 6, and 7 of polymyxin D. Interestingly, the unnatural polymyxin analogues did not show antibiotic activity against a panel of Gram-negative clinical isolates, while the natural polymyxins D1 and D2 exhibited excellent in vitro antibacterial activity and were efficacious against Klebsiella pneumoniae and Acinetobacter baumannii in a mouse blood infection model. The results demonstrate the excellent antibacterial activity of these unusual d-Ser3 polymxyins and underscore the possibility of incorporating alternate amino acids at positions 3, 6, and 7 of polymyxin D via manipulation of the polymyxin nonribosomal biosynthetic machinery.
Collapse
Affiliation(s)
| | - Meiling Han
- Monash Biomedicine Discovery Institute, Department of Microbiology, Monash University , Clayton, Victoria 3800, Australia
| | - Yan Zhu
- Monash Biomedicine Discovery Institute, Department of Microbiology, Monash University , Clayton, Victoria 3800, Australia
| | | | - Jiping Wang
- Monash Biomedicine Discovery Institute, Department of Microbiology, Monash University , Clayton, Victoria 3800, Australia
| | | | - Jian L
- Monash Biomedicine Discovery Institute, Department of Microbiology, Monash University , Clayton, Victoria 3800, Australia
| | | |
Collapse
|
408
|
Shen P, Fan J, Guo L, Li J, Li A, Zhang J, Ying C, Ji J, Xu H, Zheng B, Xiao Y. Genome sequence of Shigella flexneri strain SP1, a diarrheal isolate that encodes an extended-spectrum β-lactamase (ESBL). Ann Clin Microbiol Antimicrob 2017; 16:37. [PMID: 28499446 PMCID: PMC5429569 DOI: 10.1186/s12941-017-0212-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Accepted: 05/04/2017] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Shigellosis is the most common cause of gastrointestinal infections in developing countries. In China, the species most frequently responsible for shigellosis is Shigella flexneri. S. flexneri remains largely unexplored from a genomic standpoint and is still described using a vocabulary based on biochemical and serological properties. Moreover, increasing numbers of ESBL-producing Shigella strains have been isolated from clinical samples. Despite this, only a few cases of ESBL-producing Shigella have been described in China. Therefore, a better understanding of ESBL-producing Shigella from a genomic standpoint is required. In this study, a S. flexneri type 1a isolate SP1 harboring blaCTX-M-14, which was recovered from the patient with diarrhea, was subjected to whole genome sequencing. RESULTS The draft genome assembly of S. flexneri strain SP1 consisted of 4,592,345 bp with a G+C content of 50.46%. RAST analysis revealed the genome contained 4798 coding sequences (CDSs) and 100 RNA-encoding genes. We detected one incomplete prophage and six candidate CRISPR loci in the genome. In vitro antimicrobial susceptibility testing demonstrated that strain SP1 is resistant to ampicillin, amoxicillin/clavulanic acid, cefazolin, ceftriaxone and trimethoprim. In silico analysis detected genes mediating resistance to aminoglycosides, β-lactams, phenicol, tetracycline, sulphonamides, and trimethoprim. The bla CTX-M-14 gene was located on an IncFII2 plasmid. A series of virulence factors were identified in the genome. CONCLUSIONS In this study, we report the whole genome sequence of a blaCTX-M-14-encoding S. flexneri strain SP1. Dozens of resistance determinants were detected in the genome and may be responsible for the multidrug-resistance of this strain, although further confirmation studies are warranted. Numerous virulence factors identified in the strain suggest that isolate SP1 is potential pathogenic. The availability of the genome sequence and comparative analysis with other S. flexneri strains provides the basis to further address the evolution of drug resistance mechanisms and pathogenicity in S. flexneri.
Collapse
Affiliation(s)
- Ping Shen
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Jianzhong Fan
- Department of Clinical Laboratory, Hangzhou First People's Hospital, Hangzhou, 310006, China
| | - Lihua Guo
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Jiahua Li
- Department of Hospital Infection Control, Zhucheng People's Hospital, Zhucheng, 252300, China
| | - Ang Li
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Jing Zhang
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Chaoqun Ying
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Jinru Ji
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Hao Xu
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Beiwen Zheng
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China.
| | - Yonghong Xiao
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| |
Collapse
|
409
|
Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res 2017; 27:824-834. [PMID: 28298430 PMCID: PMC5411777 DOI: 10.1101/gr.213959.116] [Citation(s) in RCA: 2484] [Impact Index Per Article: 310.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 03/13/2017] [Indexed: 01/25/2023]
Abstract
While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.
Collapse
Affiliation(s)
- Sergey Nurk
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 199004
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 199004
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 199004.,Department of Statistical Modelling, St. Petersburg State University, St. Petersburg, Russia 198515
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 199004.,Department of Computer Science and Engineering, University of California, San Diego, California 92093-0404, USA
| |
Collapse
|
410
|
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 2017; 27:722-736. [PMID: 28298431 PMCID: PMC5411767 DOI: 10.1101/gr.215087.116] [Citation(s) in RCA: 4777] [Impact Index Per Article: 597.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 03/03/2017] [Indexed: 12/11/2022]
Abstract
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.
Collapse
Affiliation(s)
- Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | - Jason R Miller
- J. Craig Venter Institute, Rockville, Maryland 20850, USA
| | - Nicholas H Bergman
- National Biodefense Analysis and Countermeasures Center, Frederick, Maryland 21702, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
411
|
The hidden perils of read mapping as a quality assessment tool in genome sequencing. Sci Rep 2017; 7:43149. [PMID: 28225089 PMCID: PMC5320493 DOI: 10.1038/srep43149] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 01/20/2017] [Indexed: 11/16/2022] Open
Abstract
This article provides a comparative analysis of the various methods of genome sequencing focusing on verification of the assembly quality. The results of a comparative assessment of various de novo assembly tools, as well as sequencing technologies, are presented using a recently completed sequence of the genome of Lactobacillus fermentum 3872. In particular, quality of assemblies is assessed by using CLC Genomics Workbench read mapping and Optical mapping developed by OpGen. Over-extension of contigs without prior knowledge of contig location can lead to misassembled contigs, even when commonly used quality indicators such as read mapping suggest that a contig is well assembled. Precautions must also be undertaken when using long read sequencing technology, which may also lead to misassembled contigs.
Collapse
|
412
|
Abstract
The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.
Collapse
|
413
|
Bankevich A, Pevzner PA. TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat Methods 2016; 13:248-50. [PMID: 26828418 DOI: 10.1038/nmeth.3737] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 12/08/2015] [Indexed: 01/12/2023]
Abstract
The recently introduced TruSeq synthetic long read (TSLR) technology generates long and accurate virtual reads from an assembly of barcoded pools of short reads. The TSLR method provides an attractive alternative to existing sequencing platforms that generate long but inaccurate reads. We describe the truSPAdes algorithm (http://bioinf.spbau.ru/spades) for TSLR assembly and show that it results in a dramatic improvement in the quality of metagenomics assemblies.
Collapse
Affiliation(s)
- Anton Bankevich
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California, USA
| |
Collapse
|