1
|
Manual Annotation Studio (MAS): a collaborative platform for manual functional annotation of viral and microbial genomes. BMC Genomics 2021; 22:733. [PMID: 34627149 PMCID: PMC8501643 DOI: 10.1186/s12864-021-08029-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 09/22/2021] [Indexed: 11/10/2022] Open
Abstract
Background Functional genome annotation is the process of labelling functional genomic regions with descriptive information. Manual curation can produce higher quality genome annotations than fully automated methods. Manual annotation efforts are time-consuming and complex; however, software can help reduce these drawbacks. Results We created Manual Annotation Studio (MAS) to improve the efficiency of the process of manual functional annotation prokaryotic and viral genomes. MAS allows users to upload unannotated genomes, provides an interface to edit and upload annotations, tracks annotation history and progress, and saves data to a relational database. MAS provides users with pertinent information through a simple point and click interface to execute and visualize results for multiple homology search tools (blastp, rpsblast, and HHsearch) against multiple databases (Swiss-Prot, nr, CDD, PDB, and an internally generated database). MAS was designed to accept connections over the local area network (LAN) of a lab or organization so multiple users can access it simultaneously. MAS can take advantage of high-performance computing (HPC) clusters by interfacing with SGE or SLURM and data can be exported from MAS in a variety of formats (FASTA, GenBank, GFF, and excel). Conclusions MAS streamlines and provides structure to manual functional annotation projects. MAS enhances the ability of users to generate, interpret, and compare results from multiple tools. The structure that MAS provides can improve project organization and reduce annotation errors. MAS is ideal for team-based annotation projects because it facilitates collaboration. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08029-8.
Collapse
|
2
|
Shieh YK, Liu SC, Lung Lu C. Scaffolding Contigs Using Multiple Reference Genomes. Comput Biol Chem 2020. [DOI: 10.5772/intechopen.93456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Scaffolding is an important step of the genome assembly and its function is to order and orient the contigs in the assembly of a draft genome into larger scaffolds. Several single reference-based scaffolders have currently been proposed. However, a single reference genome may not be sufficient alone for a scaffolder to correctly scaffold a target draft genome, especially when the target genome and the reference genome have distant evolutionary relationship or some rearrangements. This motivates researchers to develop the so-called multiple reference-based scaffolders that can utilize multiple reference genomes, which may provide different but complementary types of scaffolding information, to scaffold the target draft genome. In this chapter, we will review some of the state-of-the-art multiple reference-based scaffolders, such as Ragout, MeDuSa and Multi-CAR, and give a complete introduction to Multi-CSAR, an improved extension of Multi-CAR.
Collapse
|
3
|
Khachatryan L, de Leeuw RH, Kraakman MEM, Pappas N, Te Raa M, Mei H, de Knijff P, Laros JFJ. Taxonomic classification and abundance estimation using 16S and WGS-A comparison using controlled reference samples. Forensic Sci Int Genet 2020; 46:102257. [PMID: 32058299 DOI: 10.1016/j.fsigen.2020.102257] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Revised: 12/30/2019] [Accepted: 01/27/2020] [Indexed: 12/30/2022]
Abstract
The assessment of microbiome biodiversity is the most common application of metagenomics. While 16S sequencing remains standard procedure for taxonomic profiling of metagenomic data, a growing number of studies have clearly demonstrated biases associated with this method. By using Whole Genome Shotgun sequencing (WGS) metagenomics, most of the known restrictions associated with 16S data are alleviated. However, due to the computationally intensive data analyses and higher sequencing costs, WGS based metagenomics remains a less popular option. Selecting the experiment type that provides a comprehensive, yet manageable amount of information is a challenge encountered in many metagenomics studies. In this work, we created a series of artificial bacterial mixes, each with a different distribution of skin-associated microbial species. These mixes were used to estimate the resolution of two different metagenomic experiments - 16S and WGS - and to evaluate several different bioinformatics approaches for taxonomic read classification. In all test cases, WGS approaches provide much more accurate results, in terms of taxa prediction and abundance estimation, in comparison to those of 16S. Furthermore, we demonstrate that a 16S dataset, analysed using different state of the art techniques and reference databases, can produce widely different results. In light of the fact that most forensic metagenomic analysis are still performed using 16S data, our results are especially important.
Collapse
Affiliation(s)
- Lusine Khachatryan
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands.
| | - Rick H de Leeuw
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Margriet E M Kraakman
- Department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands
| | - Nikos Pappas
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
| | - Marije Te Raa
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, the Netherlands
| | - Peter de Knijff
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Jeroen F J Laros
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands; Department of Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
4
|
Multiplexed Non-barcoded Long-Read Sequencing and Assembling Genomes of Bacillus Strains in Error-Free Simulations. Curr Microbiol 2019; 77:79-84. [PMID: 31722044 DOI: 10.1007/s00284-019-01808-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 11/02/2019] [Indexed: 10/25/2022]
Abstract
The generation of genomic data from microorganisms has revolutionized our abilities to understand their biology, but it is still challenging to obtain complete genome sequences of microbes in an automated high-throughput and cost-effective manner. While the advent of second-generation sequencing technologies provided significantly higher throughput, their shorter lengths and more pronounced sequence-context bias led to a shift towards resequencing applications. Recently, single molecule real-time (SMRT) DNA sequencing has been used to generate sequencing reads that are much longer than other sequencing platforms, facilitating de novo genome assembly and genome finishing. Here we introduced a novel multiplex strategy to make full use of the capacity and characteristics of SMRT sequencing in microbe genome assembly. We used error-free simulations to evaluate the practicability of assembling SMRT genomic sequencing data from multiple microbes into finished genomes once at a time. Then we compared the influence of two key factors, including sequencing coverage and read length, on multiplex assembling. Our results showed that long-read genomic sequencing inherently provided the ability to assemble genomic sequencing data from multiple microbes into finished genomes due to its long length. This approach might be helpful for the various groups of microbial genome projects or metagenomics research.
Collapse
|
5
|
Waters NR, Abram F, Brennan F, Holmes A, Pritchard L. riboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions. Nucleic Acids Res 2019; 46:e68. [PMID: 29608703 PMCID: PMC6009695 DOI: 10.1093/nar/gky212] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 03/12/2018] [Indexed: 11/12/2022] Open
Abstract
The vast majority of bacterial genome sequencing has been performed using Illumina short reads. Because of the inherent difficulty of resolving repeated regions with short reads alone, only ∼10% of sequencing projects have resulted in a closed genome. The most common repeated regions are those coding for ribosomal operons (rDNAs), which occur in a bacterial genome between 1 and 15 times, and are typically used as sequence markers to classify and identify bacteria. Here, we exploit the genomic context in which rDNAs occur across taxa to improve assembly of these regions relative to de novo sequencing by using the conserved nature of rDNAs across taxa and the uniqueness of their flanking regions within a genome. We describe a method to construct targeted pseudocontigs generated by iteratively assembling reads that map to a reference genome’s rDNAs. These pseudocontigs are then used to more accurately assemble the newly sequenced chromosome. We show that this method, implemented as riboSeed, correctly bridges across adjacent contigs in bacterial genome assembly and, when used in conjunction with other genome polishing tools, can assist in closure of a genome.
Collapse
Affiliation(s)
- Nicholas R Waters
- Microbiology, School of Natural Sciences, National University of Ireland, Galway, H91 TK33, Ireland.,Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland
| | - Florence Abram
- Microbiology, School of Natural Sciences, National University of Ireland, Galway, H91 TK33, Ireland
| | - Fiona Brennan
- Microbiology, School of Natural Sciences, National University of Ireland, Galway, H91 TK33, Ireland.,Soil and Environmental Microbiology, Environmental Research Centre, Teagasc, Johnstown Castle, Wexford, Y35 TC97, Ireland
| | - Ashleigh Holmes
- Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland
| | - Leighton Pritchard
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, Scotland
| |
Collapse
|
6
|
Veras AADO, Merlin B, de Sá PHCG. ImproveAssembly - Tool for identifying new gene products and improving genome assembly. PLoS One 2018; 13:e0206000. [PMID: 30365512 PMCID: PMC6203371 DOI: 10.1371/journal.pone.0206000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 10/04/2018] [Indexed: 11/18/2022] Open
Abstract
The availability of biological information in public databases has increased exponentially. To ensure the accuracy of this information, researchers have adopted several methods and refinements to avoid the dissemination of incorrect information; for example, several automated tools are available for annotation processes. However, manual curation ensures and enriches biological information. Additionally, the genomic finishing process is complex, resulting in increased deposition of drafts genomes. This introduces bias in other omics analyses because incomplete genomic content is used. This is also observed for complete genomes. For example, genomes generated by reference assembly may not include new products in the new sequence or errors or bias can occur during the assembly process. Thus, we developed ImproveAssembly, a tool capable of identifying new products missing from genomic sequences, which can be used for complete and draft genomes. The identified products can improve the annotation of complete genomes and drafts while significantly reducing the bias when the information is used in other omics analyses.
Collapse
Affiliation(s)
| | - Bruno Merlin
- Faculty of Computer Engineering, Federal University of Pará campus Tucuruí (CAMTUC-UFPA), Pará, Brazil
| | | |
Collapse
|
7
|
Acuña-Amador L, Primot A, Cadieu E, Roulet A, Barloy-Hubler F. Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains. BMC Genomics 2018; 19:54. [PMID: 29338683 PMCID: PMC5771137 DOI: 10.1186/s12864-017-4429-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/29/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation. RESULTS We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains. CONCLUSIONS In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.
Collapse
Affiliation(s)
- Luis Acuña-Amador
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.,Laboratorio de Investigación en Bacteriología Anaerobia, Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Aline Primot
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France
| | - Edouard Cadieu
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France
| | - Alain Roulet
- GenoToul Genome & Transcriptome (GeT-PlaGe), INRA, US1426, Castanet-Tolosan, France
| | - Frédérique Barloy-Hubler
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.
| |
Collapse
|
8
|
Zhang Y, Kitajima M, Whittle AJ, Liu WT. Benefits of Genomic Insights and CRISPR-Cas Signatures to Monitor Potential Pathogens across Drinking Water Production and Distribution Systems. Front Microbiol 2017; 8:2036. [PMID: 29097994 PMCID: PMC5654357 DOI: 10.3389/fmicb.2017.02036] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 10/05/2017] [Indexed: 11/22/2022] Open
Abstract
The occurrence of pathogenic bacteria in drinking water distribution systems (DWDSs) is a major health concern, and our current understanding is mostly related to pathogenic species such as Legionella pneumophila and Mycobacterium avium but not to bacterial species closely related to them. In this study, genomic-based approaches were used to characterize pathogen-related species in relation to their abundance, diversity, potential pathogenicity, genetic exchange, and distribution across an urban drinking water system. Nine draft genomes recovered from 10 metagenomes were identified as Legionella (4 draft genomes), Mycobacterium (3 draft genomes), Parachlamydia (1 draft genome), and Leptospira (1 draft genome). The pathogenicity potential of these genomes was examined by the presence/absence of virulence machinery, including genes belonging to Type III, IV, and VII secretion systems and their effectors. Several virulence factors known to pathogenic species were detected with these retrieved draft genomes except the Leptospira-related genome. Identical clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins (CRISPR-Cas) genetic signatures were observed in two draft genomes recovered at different stages of the studied system, suggesting that the spacers in CRISPR-Cas could potentially be used as a biomarker in the monitoring of Legionella related strains at an evolutionary scale of several years across different drinking water production and distribution systems. Overall, metagenomics approach was an effective and complementary tool of culturing techniques to gain insights into the pathogenic characteristics and the CRISPR-Cas signatures of pathogen-related species in DWDSs.
Collapse
Affiliation(s)
- Ya Zhang
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Masaaki Kitajima
- Division of Environmental Engineering, Faculty of Engineering, Hokkaido University, Sapporo, Japan
| | - Andrew J Whittle
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Wen-Tso Liu
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
9
|
Kremer FS, McBride AJA, Pinto LDS. Approaches for in silico finishing of microbial genome sequences. Genet Mol Biol 2017; 40:553-576. [PMID: 28898352 PMCID: PMC5596377 DOI: 10.1590/1678-4685-gmb-2016-0230] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 03/13/2017] [Indexed: 12/15/2022] Open
Abstract
The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
Collapse
Affiliation(s)
- Frederico Schmitt Kremer
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| | - Alan John Alexander McBride
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| | - Luciano da Silva Pinto
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| |
Collapse
|
10
|
Chen KT, Chen CJ, Shen HT, Liu CL, Huang SH, Lu CL. Multi-CAR: a tool of contig scaffolding using multiple references. BMC Bioinformatics 2016; 17:469. [PMID: 28155633 PMCID: PMC5260120 DOI: 10.1186/s12859-016-1328-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown. To further obtain its complete sequence, a contig scaffolding process is usually applied to order and orient the contigs in the draft genome. Although several single reference-based scaffolding tools have been proposed, they may produce erroneous scaffolds if there are rearrangements between the target and reference genomes or their phylogenetic relationship is distant. This may suggest that a single reference genome may not be sufficient to produce correct scaffolds of a draft genome. RESULTS In this study, we design a simple heuristic method to further revise our single reference-based scaffolding tool CAR into a new one called Multi-CAR such that it can utilize multiple complete genomes of related organisms as references to more accurately order and orient the contigs of a draft genome. In practical usage, our Multi-CAR does not require prior knowledge concerning phylogenetic relationships among the draft and reference genomes and libraries of paired-end reads. To validate Multi-CAR, we have tested it on a real dataset composed of several prokaryotic genomes and also compared its accuracy performance with other multiple reference-based scaffolding tools Ragout and MeDuSa. Our experimental results have finally shown that Multi-CAR indeed outperforms Ragout and MeDuSa in terms of sensitivity, precision, genome coverage, scaffold number and scaffold N50 size. CONCLUSIONS Multi-CAR serves as an efficient tool that can more accurately order and orient the contigs of a draft genome based on multiple reference genomes. The web server of Multi-CAR is freely available at http://genome.cs.nthu.edu.tw/Multi-CAR/ .
Collapse
Affiliation(s)
- Kun-Tze Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, 30013, Taiwan
| | - Cheih-Jung Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, 30013, Taiwan
| | - Hsin-Ting Shen
- Department of Computer Science, National Tsing Hua University, Hsinchu, 30013, Taiwan
| | - Chia-Liang Liu
- Department of Computer Science, National Tsing Hua University, Hsinchu, 30013, Taiwan
| | - Shang-Hao Huang
- Department of Computer Science, National Tsing Hua University, Hsinchu, 30013, Taiwan
| | - Chin Lung Lu
- Department of Computer Science, National Tsing Hua University, Hsinchu, 30013, Taiwan.
| |
Collapse
|
11
|
Choi SC. On the study of microbial transcriptomes using second- and third-generation sequencing technologies. J Microbiol 2016; 54:527-36. [PMID: 27480632 DOI: 10.1007/s12275-016-6233-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 06/16/2016] [Accepted: 06/16/2016] [Indexed: 12/19/2022]
Abstract
Second-generation sequencing technologies transformed the study of microbial transcriptomes. They helped reveal the transcription start sites and antisense transcripts of microbial species, improving the microbial genome annotation. Quantification of genome-wide gene expression levels allowed for functional studies of microbial research. Ever-evolving sequencing technologies are reshaping approaches to studying microbial transcriptomes. Recently, Oxford Nanopore Technologies delivered a sequencing platform called MinION, a third-generation sequencing technology, to the research community. We expect it to be the next sequencing technology that enables breakthroughs in life science fields. The studies of microbial transcriptomes will be no exception. In this paper, we review microbial transcriptomics studies using second- generation sequencing technology. We also discuss the prospect of microbial transcriptomics studies with thirdgeneration sequencing.
Collapse
Affiliation(s)
- Sang Chul Choi
- Department of Biology, Sungshin Women's University, Seoul, 01133, Republic of Korea.
| |
Collapse
|
12
|
Smits SL, Bodewes R, Ruiz-González A, Baumgärtner W, Koopmans MP, Osterhaus ADME, Schürch AC. Recovering full-length viral genomes from metagenomes. Front Microbiol 2015; 6:1069. [PMID: 26483782 PMCID: PMC4589665 DOI: 10.3389/fmicb.2015.01069] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 09/17/2015] [Indexed: 12/17/2022] Open
Abstract
Infectious disease metagenomics is driven by the question: “what is causing the disease?” in contrast to classical metagenome studies which are guided by “what is out there?” In case of a novel virus, a first step to eventually establishing etiology can be to recover a full-length viral genome from a metagenomic sample. However, retrieval of a full-length genome of a divergent virus is technically challenging and can be time-consuming and costly. Here we discuss different assembly and fragment linkage strategies such as iterative assembly, motif searches, k-mer frequency profiling, coverage profile binning, and other strategies used to recover genomes of potential viral pathogens in a timely and cost-effective manner.
Collapse
Affiliation(s)
- Saskia L Smits
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| | - Rogier Bodewes
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| | - Aritz Ruiz-González
- Department of Zoology and Animal Cell Biology, University of the Basque Country (UPV/EHU) Vitoria-Gasteiz, Spain ; Systematics, Biogeography and Population Dynamics Research Group, Lascaray Research Center, University of the Basque Country (UPV/EHU) Vitoria-Gasteiz, Spain ; Conservation Genetics Laboratory, National Institute for Environmental Protection and Research Bologna, Italy
| | - Wolfgang Baumgärtner
- Department of Pathology, University of Veterinary Medicine Hannover Hannover, Germany
| | - Marion P Koopmans
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands ; Centre for Infectious Diseases Research, Diagnostics and Screening, National Institute for Public Health and the Environment Bilthoven, Netherlands
| | - Albert D M E Osterhaus
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands ; Center for Infection Medicine and Zoonoses Research Hannover, Germany
| | - Anita C Schürch
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| |
Collapse
|
13
|
Farrant GK, Hoebeke M, Partensky F, Andres G, Corre E, Garczarek L. WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data. BMC Bioinformatics 2015; 16:281. [PMID: 26335184 PMCID: PMC4559175 DOI: 10.1186/s12859-015-0705-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 08/17/2015] [Indexed: 01/12/2023] Open
Abstract
Background The sequencing depth provided by high-throughput sequencing technologies has allowed a rise in the number of de novo sequenced genomes that could potentially be closed without further sequencing. However, genome scaffolding and closure require costly human supervision that often results in genomes being published as drafts. A number of automatic scaffolders were recently released, which improved the global quality of genomes published in the last few years. Yet, none of them reach the efficiency of manual scaffolding. Results Here, we present an innovative semi-automatic scaffolder that additionally helps with chimerae resolution and generates valuable contig maps and outputs for manual improvement of the automatic scaffolding. This software was tested on the newly sequenced marine cyanobacterium Synechococcus sp. WH8103 as well as two reference datasets used in previous studies, Rhodobacter sphaeroides and Homo sapiens chromosome 14 (http://gage.cbcb.umd.edu/). The quality of resulting scaffolds was compared to that of three other stand-alone scaffolders: SSPACE, SOPRA and SCARPA. For all three model organisms, WiseScaffolder produced better results than other scaffolders in terms of contiguity statistics (number of genome fragments, N50, LG50, etc.) and, in the case of WH8103, the reliability of the scaffolds was confirmed by whole genome alignment against a closely related reference genome. We also propose an efficient computer-assisted strategy for manual improvement of the scaffolding, using outputs generated by WiseScaffolder, as well as for genome finishing that in our hands led to the circularization of the WH8103 genome. Conclusion Altogether, WiseScaffolder proved more efficient than three other scaffolders for both prokaryotic and eukaryotic genomes and is thus likely applicable to most genome projects. The scaffolding pipeline described here should be of particular interest to biologists wishing to take advantage of the high added value of complete genomes. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0705-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gregory K Farrant
- Sorbonne Universités, UPMC Univ. Paris 06, UMR 7144, Station Biologique, CS 90074, 29688, Roscoff cedex, France.,CNRS, UMR 7144 Adaptation and Diversity in the Marine Environment, Oceanic Plankton Group, Marine Phototrophic Prokaryotes team, Place Georges Teissier, CS 90074, 29688, Roscoff cedex, France
| | - Mark Hoebeke
- CNRS, FR 2424, ABiMS Platform, Station Biologique, Place Georges Teissier, CS 90074, 29688, Roscoff cedex, France
| | - Frédéric Partensky
- Sorbonne Universités, UPMC Univ. Paris 06, UMR 7144, Station Biologique, CS 90074, 29688, Roscoff cedex, France.,CNRS, UMR 7144 Adaptation and Diversity in the Marine Environment, Oceanic Plankton Group, Marine Phototrophic Prokaryotes team, Place Georges Teissier, CS 90074, 29688, Roscoff cedex, France
| | - Gwendoline Andres
- CNRS, FR 2424, ABiMS Platform, Station Biologique, Place Georges Teissier, CS 90074, 29688, Roscoff cedex, France
| | - Erwan Corre
- CNRS, FR 2424, ABiMS Platform, Station Biologique, Place Georges Teissier, CS 90074, 29688, Roscoff cedex, France
| | - Laurence Garczarek
- Sorbonne Universités, UPMC Univ. Paris 06, UMR 7144, Station Biologique, CS 90074, 29688, Roscoff cedex, France. .,CNRS, UMR 7144 Adaptation and Diversity in the Marine Environment, Oceanic Plankton Group, Marine Phototrophic Prokaryotes team, Place Georges Teissier, CS 90074, 29688, Roscoff cedex, France.
| |
Collapse
|
14
|
Eastman AW, Yuan ZC. Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing. Front Microbiol 2015; 5:769. [PMID: 25653642 PMCID: PMC4301005 DOI: 10.3389/fmicb.2014.00769] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 12/16/2014] [Indexed: 01/10/2023] Open
Abstract
Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects.
Collapse
Affiliation(s)
- Alexander W Eastman
- Southern Crop Protection and Food Research Centre, Agriculture and Agri-Food Canada, Government of Canada London, ON, Canada ; Department of Microbiology and Immunology, Schulich School of Medicine and Dentistry, University of Western Ontario London, ON, Canada
| | - Ze-Chun Yuan
- Southern Crop Protection and Food Research Centre, Agriculture and Agri-Food Canada, Government of Canada London, ON, Canada ; Department of Microbiology and Immunology, Schulich School of Medicine and Dentistry, University of Western Ontario London, ON, Canada
| |
Collapse
|
15
|
Forde BM, Ben Zakour NL, Stanton-Cook M, Phan MD, Totsika M, Peters KM, Chan KG, Schembri MA, Upton M, Beatson SA. The complete genome sequence of Escherichia coli EC958: a high quality reference sequence for the globally disseminated multidrug resistant E. coli O25b:H4-ST131 clone. PLoS One 2014; 9:e104400. [PMID: 25126841 PMCID: PMC4134206 DOI: 10.1371/journal.pone.0104400] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Accepted: 07/11/2014] [Indexed: 11/18/2022] Open
Abstract
Escherichia coli ST131 is now recognised as a leading contributor to urinary tract and bloodstream infections in both community and clinical settings. Here we present the complete, annotated genome of E. coli EC958, which was isolated from the urine of a patient presenting with a urinary tract infection in the Northwest region of England and represents the most well characterised ST131 strain. Sequencing was carried out using the Pacific Biosciences platform, which provided sufficient depth and read-length to produce a complete genome without the need for other technologies. The discovery of spurious contigs within the assembly that correspond to site-specific inversions in the tail fibre regions of prophages demonstrates the potential for this technology to reveal dynamic evolutionary mechanisms. E. coli EC958 belongs to the major subgroup of ST131 strains that produce the CTX-M-15 extended spectrum β-lactamase, are fluoroquinolone resistant and encode the fimH30 type 1 fimbrial adhesin. This subgroup includes the Indian strain NA114 and the North American strain JJ1886. A comparison of the genomes of EC958, JJ1886 and NA114 revealed that differences in the arrangement of genomic islands, prophages and other repetitive elements in the NA114 genome are not biologically relevant and are due to misassembly. The availability of a high quality uropathogenic E. coli ST131 genome provides a reference for understanding this multidrug resistant pathogen and will facilitate novel functional, comparative and clinical studies of the E. coli ST131 clonal lineage.
Collapse
Affiliation(s)
- Brian M. Forde
- Australian Infectious Diseases Research Centre, School of Chemistry & Molecular Biosciences, The University of Queensland, Queensland, Australia
| | - Nouri L. Ben Zakour
- Australian Infectious Diseases Research Centre, School of Chemistry & Molecular Biosciences, The University of Queensland, Queensland, Australia
| | - Mitchell Stanton-Cook
- Australian Infectious Diseases Research Centre, School of Chemistry & Molecular Biosciences, The University of Queensland, Queensland, Australia
| | - Minh-Duy Phan
- Australian Infectious Diseases Research Centre, School of Chemistry & Molecular Biosciences, The University of Queensland, Queensland, Australia
| | - Makrina Totsika
- Australian Infectious Diseases Research Centre, School of Chemistry & Molecular Biosciences, The University of Queensland, Queensland, Australia
| | - Kate M. Peters
- Australian Infectious Diseases Research Centre, School of Chemistry & Molecular Biosciences, The University of Queensland, Queensland, Australia
| | - Kok Gan Chan
- Division of Genetics and Molecular Biology, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Mark A. Schembri
- Australian Infectious Diseases Research Centre, School of Chemistry & Molecular Biosciences, The University of Queensland, Queensland, Australia
| | - Mathew Upton
- Plymouth University Peninsula Schools of Medicine and Dentistry, Plymouth, United Kingdom
| | - Scott A. Beatson
- Australian Infectious Diseases Research Centre, School of Chemistry & Molecular Biosciences, The University of Queensland, Queensland, Australia
- * E-mail:
| |
Collapse
|
16
|
Utturkar SM, Klingeman DM, Land ML, Schadt CW, Doktycz MJ, Pelletier DA, Brown SD. Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences. ACTA ACUST UNITED AC 2014; 30:2709-16. [PMID: 24930142 PMCID: PMC4173024 DOI: 10.1093/bioinformatics/btu391] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. RESULTS Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as an additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. AVAILABILITY AND IMPLEMENTATION All assembly tools except CLC Genomics Workbench are freely available under GNU General Public License. CONTACT brownsd@ornl.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sagar M Utturkar
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Dawn M Klingeman
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Miriam L Land
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Christopher W Schadt
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Mitchel J Doktycz
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Dale A Pelletier
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Steven D Brown
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
17
|
Lery LMS, Frangeul L, Tomas A, Passet V, Almeida AS, Bialek-Davenet S, Barbe V, Bengoechea JA, Sansonetti P, Brisse S, Tournebize R. Comparative analysis of Klebsiella pneumoniae genomes identifies a phospholipase D family protein as a novel virulence factor. BMC Biol 2014; 12:41. [PMID: 24885329 PMCID: PMC4068068 DOI: 10.1186/1741-7007-12-41] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 05/15/2014] [Indexed: 12/17/2022] Open
Abstract
Background Klebsiella pneumoniae strains are pathogenic to animals and humans, in which they are both a frequent cause of nosocomial infections and a re-emerging cause of severe community-acquired infections. K. pneumoniae isolates of the capsular serotype K2 are among the most virulent. In order to identify novel putative virulence factors that may account for the severity of K2 infections, the genome sequence of the K2 reference strain Kp52.145 was determined and compared to two K1 and K2 strains of low virulence and to the reference strains MGH 78578 and NTUH-K2044. Results In addition to diverse functions related to host colonization and virulence encoded in genomic regions common to the four strains, four genomic islands specific for Kp52.145 were identified. These regions encoded genes for the synthesis of colibactin toxin, a putative cytotoxin outer membrane protein, secretion systems, nucleases and eukaryotic-like proteins. In addition, an insertion within a type VI secretion system locus included sel1 domain containing proteins and a phospholipase D family protein (PLD1). The pld1 mutant was avirulent in a pneumonia model in mouse. The pld1 mRNA was expressed in vivo and the pld1 gene was associated with K. pneumoniae isolates from severe infections. Analysis of lipid composition of a defective E. coli strain complemented with pld1 suggests an involvement of PLD1 in cardiolipin metabolism. Conclusions Determination of the complete genome of the K2 reference strain identified several genomic islands comprising putative elements of pathogenicity. The role of PLD1 in pathogenesis was demonstrated for the first time and suggests that lipid metabolism is a novel virulence mechanism of K. pneumoniae.
Collapse
Affiliation(s)
- Letícia M S Lery
- Institut Pasteur - Pathogénie Microbienne Moléculaire, Paris, France.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Genome sequencing of Listeria monocytogenes. Methods Mol Biol 2014; 1157:223-32. [PMID: 24792562 DOI: 10.1007/978-1-4939-0703-8_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
Abstract
Genome sequencing is a key technology in microbiology. A genome sequence is the prerequisite for understanding the molecular basis of a given phenotype; this is of particular importance for pathogens. Particularly for the foodborne pathogen Listeria monocytogenes, which is an important model organism in infection biology, genome sequencing has proven to be invaluable in advancing our understanding of its virulence mechanisms and epidemiology. In this chapter, current technologies and software tools for genome sequencing and genome analysis of L. monocytogenes are described.
Collapse
|
19
|
|
20
|
OMACC: an Optical-Map-Assisted Contig Connector for improving de novo genome assembly. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S7. [PMID: 24564959 PMCID: PMC4029551 DOI: 10.1186/1752-0509-7-s6-s7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Background Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying. Results To increase the accuracy of contig connections, we develop OMACC, which carefully takes into account length information in optical maps. Specifically, it rescales optical map and applies length constraint for selecting the correct contig sequences for gap closure. In addition, it uses an advanced graph search algorithm to facilitate estimating the number of repeat copies within gaps between contigs. On both simulated and real datasets, OMACC achieves a <10% false gap-closing rate, three times lower than the ~27% false rate by FINISH, while maintaining a similar sensitivity. Conclusion As optical mapping is becoming popular and repeats are the bottleneck of assembly, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome. Availability http://140.116.235.124/~tliu/omacc
Collapse
|
21
|
BAIT: Organizing genomes and mapping rearrangements in single cells. Genome Med 2013; 5:82. [PMID: 24028793 PMCID: PMC3971352 DOI: 10.1186/gm486] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 09/09/2013] [Indexed: 12/30/2022] Open
Abstract
Strand-seq is a single-cell sequencing technique to finely map sister chromatid exchanges (SCEs) and other rearrangements. To analyze these data, we introduce BAIT, software which assigns templates and identifies and localizes SCEs. We demonstrate BAIT can refine completed reference assemblies, identifying approximately 21 Mb of incorrectly oriented fragments and placing over half (2.6 Mb) of the orphan fragments in mm10/GRCm38. BAIT also stratifies scaffold-stage assemblies, potentially accelerating the assembling and finishing of reference genomes. BAIT is available at http://sourceforge.net/projects/bait/.
Collapse
|
22
|
Kirkup BC, Mahlen S, Kallstrom G. Future-Generation Sequencing and Clinical Microbiology. Clin Lab Med 2013; 33:685-704. [DOI: 10.1016/j.cll.2013.03.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
23
|
Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD, Koren S, Pop M. De novo likelihood-based measures for comparing genome assemblies. BMC Res Notes 2013; 6:334. [PMID: 23965294 PMCID: PMC3765854 DOI: 10.1186/1756-0500-6-334] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 08/13/2013] [Indexed: 12/12/2022] Open
Abstract
Background The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments “read” by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. These “gold standards” can be expensive to produce and may only cover a small fraction of the genome, which limits their applicability to newly generated genome sequences. Here we introduce a de novo probabilistic measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics. Results We demonstrate that our de novo score can be computed quickly and accurately in a practical setting even for large datasets, by estimating the score from a relatively small sample of the reads. To demonstrate the benefits of our score, we measure the quality of the assemblies generated in the GAGE and Assemblathon 1 assembly “bake-offs” with our metric. Even without knowledge of the true reference sequence, our de novo metric closely matches the reference-based evaluation metrics used in the studies and outperforms other de novo metrics traditionally used to measure assembly quality (such as N50). Finally, we highlight the application of our score to optimize assembly parameters used in genome assemblers, which enables better assemblies to be produced, even without prior knowledge of the genome being assembled. Conclusion Likelihood-based measures, such as ours proposed here, will become the new standard for de novo assembly evaluation.
Collapse
Affiliation(s)
- Mohammadreza Ghodsi
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
| | | | | | | | | | | | | |
Collapse
|
24
|
Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy. Curr Opin Biotechnol 2013; 24:690-8. [DOI: 10.1016/j.copbio.2013.01.009] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2012] [Revised: 01/20/2013] [Accepted: 01/22/2013] [Indexed: 12/25/2022]
|
25
|
Genome Sequencing of Four Strains of Rickettsia prowazekii, the Causative Agent of Epidemic Typhus, Including One Flying Squirrel Isolate. GENOME ANNOUNCEMENTS 2013; 1:1/3/e00399-13. [PMID: 23814035 PMCID: PMC3695431 DOI: 10.1128/genomea.00399-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Rickettsia prowazekii is a notable intracellular pathogen, the agent of epidemic typhus, and a potential biothreat agent. We present here whole-genome sequence data for four strains of R. prowazekii, including one from a flying squirrel.
Collapse
|
26
|
Tuteja R, Saxena RK, Davila J, Shah T, Chen W, Xiao YL, Fan G, Saxena KB, Alverson AJ, Spillane C, Town C, Varshney RK. Cytoplasmic male sterility-associated chimeric open reading frames identified by mitochondrial genome sequencing of four Cajanus genotypes. DNA Res 2013; 20:485-95. [PMID: 23792890 PMCID: PMC3789559 DOI: 10.1093/dnares/dst025] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The hybrid pigeonpea (Cajanus cajan) breeding technology based on cytoplasmic male sterility (CMS) is currently unique among legumes and displays major potential for yield increase. CMS is defined as a condition in which a plant is unable to produce functional pollen grains. The novel chimeric open reading frames (ORFs) produced as a results of mitochondrial genome rearrangements are considered to be the main cause of CMS. To identify these CMS-related ORFs in pigeonpea, we sequenced the mitochondrial genomes of three C. cajan lines (the male-sterile line ICPA 2039, the maintainer line ICPB 2039, and the hybrid line ICPH 2433) and of the wild relative (Cajanus cajanifolius ICPW 29). A single, circular-mapping molecule of length 545.7 kb was assembled and annotated for the ICPA 2039 line. Sequence annotation predicted 51 genes, including 34 protein-coding and 17 RNA genes. Comparison of the mitochondrial genomes from different Cajanus genotypes identified 31 ORFs, which differ between lines within which CMS is present or absent. Among these chimeric ORFs, 13 were identified by comparison of the related male-sterile and maintainer lines. These ORFs display features that are known to trigger CMS in other plant species and to represent the most promising candidates for CMS-related mitochondrial rearrangements in pigeonpea.
Collapse
Affiliation(s)
- Reetu Tuteja
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
- Plant and AgriBiosciences Centre (PABC), School of Natural Sciences, National University of Ireland Galway, Galway, Ireland
| | - Rachit K. Saxena
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
| | - Jaime Davila
- Center for Plant Science Innovation, University of Nebraska, Lincoln, USA
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, USA
| | - Trushar Shah
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
| | - Wenbin Chen
- Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China
| | - Yong-Li Xiao
- J. Craig Venter Institute (JCVI), Rockville, USA
| | - Guangyi Fan
- Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China
| | - K. B. Saxena
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
| | - Andrew J. Alverson
- Department of Biological Sciences, University of Arkansas, Arkansas, USA
| | - Charles Spillane
- Plant and AgriBiosciences Centre (PABC), School of Natural Sciences, National University of Ireland Galway, Galway, Ireland
| | | | - Rajeev K. Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
- To whom correspondence should be addressed. Tel. +914030713305. Fax. +914030713071. E-mail:
| |
Collapse
|
27
|
Enhanced de novo assembly of high throughput pyrosequencing data using whole genome mapping. PLoS One 2013; 8:e61762. [PMID: 23613926 PMCID: PMC3629165 DOI: 10.1371/journal.pone.0061762] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 03/11/2013] [Indexed: 01/20/2023] Open
Abstract
Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.
Collapse
|
28
|
Edwards DJ, Holt KE. Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data. MICROBIAL INFORMATICS AND EXPERIMENTATION 2013; 3:2. [PMID: 23575213 PMCID: PMC3630013 DOI: 10.1186/2042-5783-3-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 03/31/2013] [Indexed: 12/25/2022]
|
29
|
Kisand V, Lettieri T. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools. BMC Genomics 2013; 14:211. [PMID: 23547799 PMCID: PMC3618134 DOI: 10.1186/1471-2164-14-211] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2012] [Accepted: 03/22/2013] [Indexed: 11/18/2022] Open
Abstract
Background De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. Results The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Conclusions Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize unknown bacteria with modest effort.
Collapse
Affiliation(s)
- Veljo Kisand
- Institute of Technology, Tartu University, Nooruse 1, Tartu 50411, Estonia.
| | | |
Collapse
|
30
|
Abstract
Humans are essentially sterile during gestation, but during and after birth, every body surface, including the skin, mouth, and gut, becomes host to an enormous variety of microbes, bacterial, archaeal, fungal, and viral. Under normal circumstances, these microbes help us to digest our food and to maintain our immune systems, but dysfunction of the human microbiota has been linked to conditions ranging from inflammatory bowel disease to antibiotic-resistant infections. Modern high-throughput sequencing and bioinformatic tools provide a powerful means of understanding the contribution of the human microbiome to health and its potential as a target for therapeutic interventions. This chapter will first discuss the historical origins of microbiome studies and methods for determining the ecological diversity of a microbial community. Next, it will introduce shotgun sequencing technologies such as metagenomics and metatranscriptomics, the computational challenges and methods associated with these data, and how they enable microbiome analysis. Finally, it will conclude with examples of the functional genomics of the human microbiome and its influences upon health and disease.
Collapse
Affiliation(s)
- Xochitl C. Morgan
- Department of Biostatistics, Harvard School of
Public Health, Boston, Massachusetts, United States of America
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of
Public Health, Boston, Massachusetts, United States of America
- The Broad Institute of MIT and Harvard,
Cambridge, Massachusetts, United States of America
| |
Collapse
|
31
|
Sahlin K, Street N, Lundeberg J, Arvestad L. Improved gap size estimation for scaffolding algorithms. Bioinformatics 2012; 28:2215-22. [DOI: 10.1093/bioinformatics/bts441] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
|
32
|
Hurt RA, Brown SD, Podar M, Palumbo AV, Elias DA. Sequencing intractable DNA to close microbial genomes. PLoS One 2012; 7:e41295. [PMID: 22859974 PMCID: PMC3409199 DOI: 10.1371/journal.pone.0041295] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Accepted: 06/19/2012] [Indexed: 11/18/2022] Open
Abstract
Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled “intractable” resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the “non-contiguous finished” Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
Collapse
Affiliation(s)
- Richard A. Hurt
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Steven D. Brown
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Mircea Podar
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Anthony V. Palumbo
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Dwayne A. Elias
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
33
|
Ricker N, Qian H, Fulthorpe RR. The limitations of draft assemblies for understanding prokaryotic adaptation and evolution. Genomics 2012; 100:167-75. [PMID: 22750556 DOI: 10.1016/j.ygeno.2012.06.009] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Revised: 05/31/2012] [Accepted: 06/20/2012] [Indexed: 12/26/2022]
Abstract
The de novo assembly of next generation sequencing data is a daunting task made more difficult by the presence of genomic repeats or transposable elements, resulting in an increasing number of genomes designated as completed draft assemblies. We created and assembled idealized sequence data sets for Cupriavidus metallidurans CH34, Caulobacter sp. K31, Gramella forsetii KT0803, Rhodobacter sphaeroides 2.4.1 and Bordetella bronchiseptica RB50. In addition to confirming the role of transposable elements in interrupting the assemblies, an association was found between the most fragmented regions and known or predicted genomic islands in these strains. Assembly quality was more strongly related to putative genomic island content than to any other factor examined. We believe this association indicates that draft assemblies are limiting our ability to understand the genomic context of important bacterial adaptations and that the increased effort required for finishing genomes can provide a wealth of information for future studies.
Collapse
Affiliation(s)
- N Ricker
- Department of Physical and Environmental Sciences, University of Toronto Scarborough, Canada
| | | | | |
Collapse
|
34
|
Barton MD, Barton HA. Scaffolder - software for manual genome scaffolding. SOURCE CODE FOR BIOLOGY AND MEDICINE 2012; 7:4. [PMID: 22640820 PMCID: PMC3464138 DOI: 10.1186/1751-0473-7-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Accepted: 05/03/2012] [Indexed: 11/21/2022]
Abstract
Background The assembly of next-generation short-read sequencing data can result in a fragmented non-contiguous set of genomic sequences. Therefore a common step in a genome project is to join neighbouring sequence regions together and fill gaps. This scaffolding step is non-trivial and requires manually editing large blocks of nucleotide sequence. Joining these sequences together also hides the source of each region in the final genome sequence. Taken together these considerations may make reproducing or editing an existing genome scaffold difficult. Methods The software outlined here, “Scaffolder,” is implemented in the Ruby programming language and can be installed via the RubyGems software management system. Genome scaffolds are defined using YAML - a data format which is both human and machine-readable. Command line binaries and extensive documentation are available. Results This software allows a genome build to be defined in terms of the constituent sequences using a relatively simple syntax. This syntax further allows unknown regions to be specified and additional sequence to be used to fill known gaps in the scaffold. Defining the genome construction in a file makes the scaffolding process reproducible and easier to edit compared with large FASTA nucleotide sequences. Conclusions Scaffolder is easy-to-use genome scaffolding software which promotes reproducibility and continuous development in a genome project. Scaffolder can be found at http://next.gs.
Collapse
Affiliation(s)
- Michael D Barton
- Biology Department, The University of Akron, Akron, OH, 44325-3908, USA.
| | | |
Collapse
|
35
|
Gao S, Bertrand D, Nagarajan N. FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation. LECTURE NOTES IN COMPUTER SCIENCE 2012. [DOI: 10.1007/978-3-642-33122-0_25] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
36
|
Bardaji L, Pérez-Martínez I, Rodríguez-Moreno L, Rodríguez-Palenzuela P, Sundin GW, Ramos C, Murillo J. Sequence and role in virulence of the three plasmid complement of the model tumor-inducing bacterium Pseudomonas savastanoi pv. savastanoi NCPPB 3335. PLoS One 2011; 6:e25705. [PMID: 22022435 PMCID: PMC3191145 DOI: 10.1371/journal.pone.0025705] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 09/08/2011] [Indexed: 12/18/2022] Open
Abstract
Pseudomonas savastanoi pv. savastanoi NCPPB 3335 is a model for the study of the molecular basis of disease production and tumor formation in woody hosts, and its draft genome sequence has been recently obtained. Here we closed the sequence of the plasmid complement of this strain, composed of three circular molecules of 78,357 nt (pPsv48A), 45,220 nt (pPsv48B), and 42,103 nt (pPsv48C), all belonging to the pPT23A-like family of plasmids widely distributed in the P. syringae complex. A total of 152 coding sequences were predicted in the plasmid complement, of which 38 are hypothetical proteins and seven correspond to putative virulence genes. Plasmid pPsv48A contains an incomplete Type IVB secretion system, the type III secretion system (T3SS) effector gene hopAF1, gene ptz, involved in cytokinin biosynthesis, and three copies of a gene highly conserved in plant-associated proteobacteria, which is preceded by a hrp box motif. A complete Type IVA secretion system, a well conserved origin of transfer (oriT), and a homolog of the T3SS effector gene hopAO1 are present in pPsv48B, while pPsv48C contains a gene with significant homology to isopentenyl-diphosphate delta-isomerase, type 1. Several potential mobile elements were found on the three plasmids, including three types of MITE, a derivative of IS801, and a new transposon effector, ISPsy30. Although the replication regions of these three plasmids are phylogenetically closely related, their structure is diverse, suggesting that the plasmid architecture results from an active exchange of sequences. Artificial inoculations of olive plants with mutants cured of plasmids pPsv48A and pPsv48B showed that pPsv48A is necessary for full virulence and for the development of mature xylem vessels within the knots; we were unable to obtain mutants cured of pPsv48C, which contains five putative toxin-antitoxin genes.
Collapse
Affiliation(s)
- Leire Bardaji
- Departamento de Producción Agraria, Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Pública de Navarra, Pamplona, Spain
| | - Isabel Pérez-Martínez
- Área de Genética, Facultad de Ciencias, Universidad de Málaga, Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora,” Málaga, Spain
| | - Luis Rodríguez-Moreno
- Área de Genética, Facultad de Ciencias, Universidad de Málaga, Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora,” Málaga, Spain
| | - Pablo Rodríguez-Palenzuela
- Centro de Biotecnología y Genómica de Plantas, Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Politécnica de Madrid, Campus de Montegancedo, Pozuelo de Alarcón, Madrid, Spain
| | - George W. Sundin
- Department of Plant Pathology and Center for Microbial Ecology, Michigan State University, East Lansing, Michigan, United States of America
| | - Cayo Ramos
- Área de Genética, Facultad de Ciencias, Universidad de Málaga, Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora,” Málaga, Spain
| | - Jesús Murillo
- Departamento de Producción Agraria, Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Pública de Navarra, Pamplona, Spain
- * E-mail:
| |
Collapse
|
37
|
Hypervirulent Chlamydia trachomatis clinical strain is a recombinant between lymphogranuloma venereum (L(2)) and D lineages. mBio 2011; 2:e00045-11. [PMID: 21540364 PMCID: PMC3088116 DOI: 10.1128/mbio.00045-11] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Chlamydia trachomatis is an obligate intracellular bacterium that causes a diversity of severe and debilitating diseases worldwide. Sporadic and ongoing outbreaks of lymphogranuloma venereum (LGV) strains among men who have sex with men (MSM) support the need for research on virulence factors associated with these organisms. Previous analyses have been limited to single genes or genomes of laboratory-adapted reference strain L2/434 and outbreak strain L2b/UCH-1/proctitis. We characterized an unusual LGV strain, termed L2c, isolated from an MSM with severe hemorrhagic proctitis. L2c developed nonfusing, grape-like inclusions and a cytotoxic phenotype in culture, unlike the LGV strains described to date. Deep genome sequencing revealed that L2c was a recombinant of L2 and D strains with conserved clustered regions of genetic exchange, including a 78-kb region and a partial, yet functional, toxin gene that was lost with prolonged culture. Indels (insertions/deletions) were discovered in an ftsK gene promoter and in the tarp and hctB genes, which encode key proteins involved in replication, inclusion formation, and histone H1-like protein activity, respectively. Analyses suggest that these indels affect gene and/or protein function, supporting the in vitro and disease phenotypes. While recombination has been known to occur for C. trachomatis based on gene sequence analyses, we provide the first whole-genome evidence for recombination between a virulent, invasive LGV strain and a noninvasive common urogenital strain. Given the lack of a genetic system for producing stable C. trachomatis mutants, identifying naturally occurring recombinants can clarify gene function and provide opportunities for discovering avenues for genomic manipulation. Lymphogranuloma venereum (LGV) is a prevalent and debilitating sexually transmitted disease in developing countries, although there are significant ongoing outbreaks in Australia, Europe, and the United States among men who have sex with men (MSM). Relatively little is known about LGV virulence factors, and only two LGV genomes have been sequenced to date. We isolated an LGV strain from an MSM with severe hemorrhagic proctitis that was morphologically unique in tissue culture compared with other LGV strains. Bioinformatic and statistical analyses identified the strain as a recombinant of L2 and D strains with highly conserved clustered regions of genetic exchange. The unique culture morphology and, more importantly, disease phenotype could be traced to the genes involved in recombination. The findings have implications for bacterial species evolution and, in the case of ongoing LGV outbreaks, suggest that recombination is a mechanism for strain emergence that results in significant disease pathology.
Collapse
|