451
|
Rider SD, Morgan MS, Arlian LG. Draft genome of the scabies mite. Parasit Vectors 2015; 8:585. [PMID: 26555130 PMCID: PMC4641413 DOI: 10.1186/s13071-015-1198-2] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Accepted: 11/05/2015] [Indexed: 12/11/2022] Open
Abstract
Background The disease scabies, caused by the ectoparasitic mite, Sarcoptes scabiei, causes significant morbidity in humans and other mammals worldwide. However, there is limited data available regarding the molecular basis of host specificity and host-parasite interactions. Therefore, we sought to produce a draft genome for S. scabiei and use this to identify molecular markers that will be useful for phylogenetic population studies and to identify candidate protein-coding genes that are critical to the unique biology of the parasite. Methods S. scabiei var. canis DNA was isolated from living mites and sequenced to ultra-deep coverage using paired-end technology. Sequence reads were assembled into gapped contigs using de Bruijn graph based algorithms. The assembled genome was examined for repetitive elements and gene annotation was performed using ab initio, and homology-based methods. Results The draft genome assembly was about 56.2 Mb and included a mitochondrial genome contig. The predicted proteome contained 10,644 proteins, ~67 % of which appear to have clear orthologs in other species. The genome also contained more than 140,000 simple sequence repeat loci that may be useful for population-level studies. The mitochondrial genome contained 13 protein coding loci and 20 transfer RNAs. Hundreds of candidate salivary gland protein genes were identified by comparing the scabies mite predicted proteome with sialoproteins and transcripts identified in ticks and other hematophagous arthropods. These include serpins, ferritins, reprolysins, apyrases and new members of the macrophage migration inhibitory factor (MIF) gene family. Numerous other genes coding for salivary proteins, metabolic enzymes, structural proteins, proteins that are potentially immune modulating, and vaccine candidates were identified. The genes encoding cysteine and serine protease paralogs as well as mu-type glutathione S-transferases are represented by gene clusters. S. scabiei possessed homologs for most of the 33 dust mite allergens. Conclusion The draft genome is useful for advancing our understanding of the host-parasite interaction, the biology of the mite and its phylogenetic relationship to other Acari. The identification of antigen-producing genes, candidate immune modulating proteins and pathways, and genes responsible for acaricide resistance offers opportunities for developing new methods for diagnosing, treating and preventing this disease. Electronic supplementary material The online version of this article (doi:10.1186/s13071-015-1198-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- S Dean Rider
- Department of Biological Sciences, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH, 45435, USA.
| | - Marjorie S Morgan
- Department of Biological Sciences, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH, 45435, USA.
| | - Larry G Arlian
- Department of Biological Sciences, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH, 45435, USA.
| |
Collapse
|
452
|
Henson MW, Santo Domingo JW, Kourtev PS, Jensen RV, Dunn JA, Learman DR. Metabolic and genomic analysis elucidates strain-level variation in Microbacterium spp. isolated from chromate contaminated sediment. PeerJ 2015; 3:e1395. [PMID: 26587353 PMCID: PMC4647564 DOI: 10.7717/peerj.1395] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/19/2015] [Indexed: 01/04/2023] Open
Abstract
Hexavalent chromium [Cr(VI)] is a soluble carcinogen that has caused widespread contamination of soil and water in many industrial nations. Bacteria have the potential to aid remediation as certain strains can catalyze the reduction of Cr(VI) to insoluble and less toxic Cr(III). Here, we examine Cr(VI) reducing Microbacterium spp. (Cr-K1W, Cr-K20, Cr-K29, and Cr-K32) isolated from contaminated sediment (Seymore, Indiana) and show varying chromate responses despite the isolates' phylogenetic similarity (i.e., identical 16S rRNA gene sequences). Detailed analysis identified differences based on genomic metabolic potential, growth and general metabolic capabilities, and capacity to resist and reduce Cr(VI). Taken together, the discrepancies between the isolates demonstrate the complexity inter-strain variation can have on microbial physiology and related biogeochemical processes.
Collapse
Affiliation(s)
- Michael W Henson
- Institute for Great Lakes Research and Department of Biology, Central Michigan University , Mount Pleasant, MI , United States
| | - Jorge W Santo Domingo
- National Risk Management Research Laboratory, Environmental Protection Agency , Cincinnati, OH , USA
| | - Peter S Kourtev
- Department of Biology, Central Michigan University , Mount Pleasant, MI , United States
| | - Roderick V Jensen
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech) , Blacksburg, VA , United States
| | - James A Dunn
- Institute for Great Lakes Research and Department of Biology, Central Michigan University , Mount Pleasant, MI , United States
| | - Deric R Learman
- Institute for Great Lakes Research and Department of Biology, Central Michigan University , Mount Pleasant, MI , United States
| |
Collapse
|
453
|
Greshake B, Zehr S, Dal Grande F, Meiser A, Schmitt I, Ebersberger I. Potential and pitfalls of eukaryotic metagenome skimming: a test case for lichens. Mol Ecol Resour 2015; 16:511-23. [PMID: 26345272 DOI: 10.1111/1755-0998.12463] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Revised: 07/28/2015] [Accepted: 08/22/2015] [Indexed: 11/30/2022]
Abstract
Whole-genome shotgun sequencing of multispecies communities using only a single library layout is commonly used to assess taxonomic and functional diversity of microbial assemblages. Here, we investigate to what extent such metagenome skimming approaches are applicable for in-depth genomic characterizations of eukaryotic communities, for example lichens. We address how to best assemble a particular eukaryotic metagenome skimming data, what pitfalls can occur, and what genome quality can be expected from these data. To facilitate a project-specific benchmarking, we introduce the concept of twin sets, simulated data resembling the outcome of a particular metagenome sequencing study. We show that the quality of genome reconstructions depends essentially on assembler choice. Individual tools, including the metagenome assemblers Omega and MetaVelvet, are surprisingly sensitive to low and uneven coverages. In combination with the routine of assembly parameter choice to optimize the assembly N50 size, these tools can preclude an entire genome from the assembly. In contrast, MIRA, an all-purpose overlap assembler, and SPAdes, a multisized de Bruijn graph assembler, facilitate a comprehensive view on the individual genomes across a wide range of coverage ratios. Testing assemblers on a real-world metagenome skimming data from the lichen Lasallia pustulata demonstrates the applicability of twin sets for guiding method selection. Furthermore, it reveals that the assembly outcome for the photobiont Trebouxia sp. falls behind the a priori expectation given the simulations. Although the underlying reasons remain still unclear, this highlights that further studies on this organism require special attention during sequence data generation and downstream analysis.
Collapse
Affiliation(s)
- Bastian Greshake
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Max-von-Laue Str. 13, D-60438, Frankfurt, Germany
| | - Simonida Zehr
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Max-von-Laue Str. 13, D-60438, Frankfurt, Germany
| | - Francesco Dal Grande
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberg Anlage 25, D-60325, Frankfurt, Germany
| | - Anjuli Meiser
- Institute of Ecology, Evolution and Diversity, Goethe University Frankfurt, Max-von-Laue Str. 13, D-60438, Frankfurt, Germany
| | - Imke Schmitt
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberg Anlage 25, D-60325, Frankfurt, Germany.,Institute of Ecology, Evolution and Diversity, Goethe University Frankfurt, Max-von-Laue Str. 13, D-60438, Frankfurt, Germany
| | - Ingo Ebersberger
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Max-von-Laue Str. 13, D-60438, Frankfurt, Germany
| |
Collapse
|
454
|
Wang D, Xu J, Yu J. KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation. Biol Direct 2015; 10:53. [PMID: 26376976 PMCID: PMC4573299 DOI: 10.1186/s13062-015-0083-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 09/11/2015] [Indexed: 11/28/2022] Open
Abstract
Background The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. Results To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK (http://kgcak.big.ac.cn/KGCAK/), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. Conclusion We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data. Reviewers This article was reviewed by Prof Mark Ragan and Dr Yuri Wolf.
Collapse
Affiliation(s)
- Dapeng Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, PR China. .,Stem Cell Laboratory, UCL Cancer Institute, University College London, London, WC1E 6BT, UK.
| | - Jiayue Xu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, PR China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, PR China.
| |
Collapse
|
455
|
Benoit G, Lemaitre C, Lavenier D, Drezen E, Dayris T, Uricaru R, Rizk G. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics 2015; 16:288. [PMID: 26370285 PMCID: PMC4570262 DOI: 10.1186/s12859-015-0709-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 08/17/2015] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as the widely used gzip method. RESULTS We present a novel reference-free method meant to compress data issued from high throughput sequencing technologies. Our approach, implemented in the software LEON, employs techniques derived from existing assembly principles. The method is based on a reference probabilistic de Bruijn Graph, built de novo from the set of reads and stored in a Bloom filter. Each read is encoded as a path in this graph, by memorizing an anchoring kmer and a list of bifurcations. The same probabilistic de Bruijn Graph is used to perform a lossy transformation of the quality scores, which allows to obtain higher compression rates without losing pertinent information for downstream analyses. CONCLUSIONS LEON was run on various real sequencing datasets (whole genome, exome, RNA-seq or metagenomics). In all cases, LEON showed higher overall compression ratios than state-of-the-art compression software. On a C. elegans whole genome sequencing dataset, LEON divided the original file size by more than 20. LEON is an open source software, distributed under GNU affero GPL License, available for download at http://gatb.inria.fr/software/leon/.
Collapse
Affiliation(s)
- Gaëtan Benoit
- INRIA/IRISA/GenScale, Campus de Beaulieu, Rennes, 35042, France.
| | - Claire Lemaitre
- INRIA/IRISA/GenScale, Campus de Beaulieu, Rennes, 35042, France.
| | | | - Erwan Drezen
- INRIA/IRISA/GenScale, Campus de Beaulieu, Rennes, 35042, France.
| | - Thibault Dayris
- University of Bordeaux, CNRS/LaBRI, Talence, F-33405, France.
| | - Raluca Uricaru
- University of Bordeaux, CNRS/LaBRI, Talence, F-33405, France.
- University of Bordeaux, CBiB, Bordeaux, F-33000, France.
| | - Guillaume Rizk
- INRIA/IRISA/GenScale, Campus de Beaulieu, Rennes, 35042, France.
| |
Collapse
|
456
|
Song L, Florea L, Langmead B. Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol 2015; 15:509. [PMID: 25398208 PMCID: PMC4248469 DOI: 10.1186/s13059-014-0509-9] [Citation(s) in RCA: 150] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2014] [Indexed: 02/02/2023] Open
Abstract
Lighter is a fast, memory-efficient tool for correcting sequencing errors. Lighter avoids counting k-mers. Instead, it uses a pair of Bloom filters, one holding a sample of the input k-mers and the other holding k-mers likely to be correct. As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy. Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy.
Collapse
|
457
|
Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit ML, Zook JM. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 2015. [PMID: 26217378 PMCID: PMC4493402 DOI: 10.3389/fgene.2015.00235] [Citation(s) in RCA: 114] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.
Collapse
Affiliation(s)
- Nathan D Olson
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Steven P Lund
- Statistical Engineering Division, Information Technology Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Rebecca E Colman
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Jeffrey T Foster
- Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jason W Sahl
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - James M Schupp
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Paul Keim
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jayne B Morrow
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Marc L Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA ; Department of Bioengineering, Stanford University , Stanford, CA, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| |
Collapse
|
458
|
Pelin A, Selman M, Aris-Brosou S, Farinelli L, Corradi N. Genome analyses suggest the presence of polyploidy and recent human-driven expansions in eight global populations of the honeybee pathogen Nosema ceranae. Environ Microbiol 2015; 17:4443-58. [PMID: 25914091 DOI: 10.1111/1462-2920.12883] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 04/13/2015] [Accepted: 04/15/2015] [Indexed: 12/23/2022]
Abstract
Nosema ceranae is a microsporidian pathogen whose infections have been associated with recent global declines in the populations of western honeybees (Apis mellifera). Despite the outstanding economic and ecological threat that N. ceranae may represent for honeybees worldwide, many aspects of its biology, including its mode of reproduction, propagation and ploidy, are either very unclear or unknown. In the present study, we set to gain knowledge in these biological aspects by re-sequencing the genome of eight isolates (i.e. a population of spores isolated from one single beehive) of this species harvested from eight geographically distant beehives, and by investigating their level of polymorphism. Consistent with previous analyses performed using single gene sequences, our analyses uncovered the presence of very high genetic diversity within each isolate, but also very little hive-specific polymorphism. Surprisingly, the nature, location and distribution of this genetic variation suggest that beehives around the globe are infected by a population of N. ceranae cells that may be polyploid (4n or more), and possibly clonal. Lastly, phylogenetic analyses based on genome-wide single-nucleotide polymorphism data extracted from these parasites and mitochondrial sequences from their hosts all failed to support the current geographical structure of our isolates.
Collapse
Affiliation(s)
- Adrian Pelin
- Canadian Institute for Advanced Research, Department of Biology; University of Ottawa, Ottawa, ON, Canada
| | - Mohammed Selman
- Canadian Institute for Advanced Research, Department of Biology; University of Ottawa, Ottawa, ON, Canada
| | - Stéphane Aris-Brosou
- Departments of Biology and of Mathematics & Statistics, University of Ottawa, Ottawa, ON, Canada
| | - Laurent Farinelli
- FASTERIS S.A., Ch. du Pont-du-Centenaire 109, P.O. Box 28, Plan-les-Ouates, CH-1228, Geneva, Switzerland
| | - Nicolas Corradi
- Canadian Institute for Advanced Research, Department of Biology; University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
459
|
De Novo Assembly of Bitter Gourd Transcriptomes: Gene Expression and Sequence Variations in Gynoecious and Monoecious Lines. PLoS One 2015; 10:e0128331. [PMID: 26047102 PMCID: PMC4457790 DOI: 10.1371/journal.pone.0128331] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Accepted: 04/26/2015] [Indexed: 11/19/2022] Open
Abstract
Bitter gourd (Momordica charantia L.) is a nutritious vegetable crop of Asian origin, used as a medicinal herb in Indian and Chinese traditional medicine. Molecular breeding in bitter gourd is in its infancy, due to limited molecular resources, particularly on functional markers for traits such as gynoecy. We performed de novo transcriptome sequencing of bitter gourd using Illumina next-generation sequencer, from root, flower buds, stem and leaf samples of gynoecious line (Gy323) and a monoecious line (DRAR1). A total of 65,540 transcripts for Gy323 and 61,490 for DRAR1 were obtained. Comparisons revealed SNP and SSR variations between these lines and, identification of gene classes. Based on available transcripts we identified 80 WRKY transcription factors, several reported in responses to biotic and abiotic stresses; 56 ARF genes which play a pivotal role in auxin-regulated gene expression and development. The data presented will be useful in both functions studies and breeding programs in bitter gourd.
Collapse
|
460
|
Mans BJ, de Klerk D, Pienaar R, de Castro MH, Latif AA. Next-generation sequencing as means to retrieve tick systematic markers, with the focus on Nuttalliella namaqua (Ixodoidea: Nuttalliellidae). Ticks Tick Borne Dis 2015; 6:450-62. [DOI: 10.1016/j.ttbdis.2015.03.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Revised: 03/06/2015] [Accepted: 03/08/2015] [Indexed: 10/23/2022]
|
461
|
Draft Genome Sequence of Lachancea lanzarotensis CBS 12615T, an Ascomycetous Yeast Isolated from Grapes. GENOME ANNOUNCEMENTS 2015; 3:3/2/e00292-15. [PMID: 25883293 PMCID: PMC4400436 DOI: 10.1128/genomea.00292-15] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We report the genome sequencing of the yeast Lachancea lanzarotensis CBS 12615T. The assembly comprises 24 scaffolds, for a total size of 11.46 Mbp. The annotation revealed 5,058 putative protein-coding genes. Detection of seven centromeres supports a chromosome fusion, which occurred after divergence from Lachancea thermotolerans and Lachancea kluyveri.
Collapse
|
462
|
Szövényi P, Frangedakis E, Ricca M, Quandt D, Wicke S, Langdale JA. Establishment of Anthoceros agrestis as a model species for studying the biology of hornworts. BMC PLANT BIOLOGY 2015; 15:98. [PMID: 25886741 PMCID: PMC4393856 DOI: 10.1186/s12870-015-0481-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 03/24/2015] [Indexed: 05/18/2023]
Abstract
BACKGROUND Plants colonized terrestrial environments approximately 480 million years ago and have contributed significantly to the diversification of life on Earth. Phylogenetic analyses position a subset of charophyte algae as the sister group to land plants, and distinguish two land plant groups that diverged around 450 million years ago - the bryophytes and the vascular plants. Relationships between liverworts, mosses hornworts and vascular plants have proven difficult to resolve, and as such it is not clear which bryophyte lineage is the sister group to all other land plants and which is the sister to vascular plants. The lack of comparative molecular studies in representatives of all three lineages exacerbates this uncertainty. Such comparisons can be made between mosses and liverworts because representative model organisms are well established in these two bryophyte lineages. To date, however, a model hornwort species has not been available. RESULTS Here we report the establishment of Anthoceros agrestis as a model hornwort species for laboratory experiments. Axenic culture conditions for maintenance and vegetative propagation have been determined, and treatments for the induction of sexual reproduction and sporophyte development have been established. In addition, protocols have been developed for the extraction of DNA and RNA that is of a quality suitable for molecular analyses. Analysis of haploid-derived genome sequence data of two A. agrestis isolates revealed single nucleotide polymorphisms at multiple loci, and thus these two strains are suitable starting material for classical genetic and mapping experiments. CONCLUSIONS Methods and resources have been developed to enable A. agrestis to be used as a model species for developmental, molecular, genomic, and genetic studies. This advance provides an unprecedented opportunity to investigate the biology of hornworts.
Collapse
Affiliation(s)
- Péter Szövényi
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
- Institute of Systematic Botany, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.
- MTA-ELTE-MTM Ecology Research Group, ELTE, Biological Institute, Budapest, Hungary.
| | - Eftychios Frangedakis
- Department of Plant Sciences, University of Oxford, South Parks Rd, Oxford, UK.
- Current Address: Graduate School of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113 0033, Japan.
| | - Mariana Ricca
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.
| | - Dietmar Quandt
- Nees-Institut für Biodiversität der Pflanzen, University of Bonn, Meckenheimer Allee 170, D - 53115, Bonn, Germany.
| | - Susann Wicke
- Nees-Institut für Biodiversität der Pflanzen, University of Bonn, Meckenheimer Allee 170, D - 53115, Bonn, Germany.
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstr. 1, 48149, Muenster, Germany.
| | - Jane A Langdale
- Department of Plant Sciences, University of Oxford, South Parks Rd, Oxford, UK.
| |
Collapse
|
463
|
Deng X, Naccache SN, Ng T, Federman S, Li L, Chiu CY, Delwart EL. An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data. Nucleic Acids Res 2015; 43:e46. [PMID: 25586223 PMCID: PMC4402509 DOI: 10.1093/nar/gkv002] [Citation(s) in RCA: 206] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 01/04/2015] [Indexed: 11/12/2022] Open
Abstract
Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.
Collapse
Affiliation(s)
- Xutao Deng
- Blood Systems Research Institute, San Francisco, CA 94118, USA Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA
| | - Samia N Naccache
- Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, CA 94107, USA
| | - Terry Ng
- Blood Systems Research Institute, San Francisco, CA 94118, USA Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA
| | - Scot Federman
- Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, CA 94107, USA
| | - Linlin Li
- Blood Systems Research Institute, San Francisco, CA 94118, USA Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA
| | - Charles Y Chiu
- Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, CA 94107, USA Department of Medicine, Division of Infectious Diseases, UCSF, San Francisco, CA 94143, USA
| | - Eric L Delwart
- Blood Systems Research Institute, San Francisco, CA 94118, USA Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA
| |
Collapse
|
464
|
Campana MG, Robles García NM, Tuross N. America's red gold: multiple lineages of cultivated cochineal in Mexico. Ecol Evol 2015; 5:607-17. [PMID: 25691985 PMCID: PMC4328766 DOI: 10.1002/ece3.1398] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Revised: 12/15/2014] [Accepted: 12/18/2014] [Indexed: 01/31/2023] Open
Abstract
Cultivated cochineal (Dactylopius coccus) produces carminic acid, a valuable red dye used to color textiles, cosmetics, and food. Extant native D. coccus is largely restricted to two populations in the Mexican and the Andean highlands, although the insect's ultimate center of domestication remains unclear. Moreover, due to Mexican D. coccus cultivation's near demise during the 19th century, the genetic diversity of current cochineal stock is unknown. Through genomic sequencing, we identified two divergent D. coccus populations in highland Mexico: one unique to Mexico and another that was more closely related to extant Andean cochineal. Relic diversity is preserved in the crops of small-scale Mexican cochineal farmers. Conversely, larger-scale commercial producers are cultivating the Andean-like cochineal, which may reflect clandestine 20th century importation.
Collapse
Affiliation(s)
- Michael G Campana
- Department of Human Evolutionary Biology, Harvard University 11 Divinity Avenue, Cambridge, Massachusetts, 02138
| | - Nelly M Robles García
- Proyecto Conjunto Monumental de Atzompa Calle Reforma 501, esq. Constitución. Sala IV. Centro Histórico, Oaxaca, Oaxaca, 68000, Mexico
| | - Noreen Tuross
- Department of Human Evolutionary Biology, Harvard University 11 Divinity Avenue, Cambridge, Massachusetts, 02138
| |
Collapse
|
465
|
Bloom Filter Trie – A Data Structure for Pan-Genome Storage. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-662-48221-6_16] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
466
|
Shariat B, Movahedi NS, Chitsaz H, Boucher C. HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly. BMC Genomics 2014; 15 Suppl 10:S9. [PMID: 25558875 PMCID: PMC4304221 DOI: 10.1186/1471-2164-15-s10-s9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Intimately tied to assembly quality is the complexity of the de Bruijn graph built by the assembler. Thus, there have been many paradigms developed to decrease the complexity of the de Bruijn graph. One obvious combinatorial paradigm for this is to allow the value of k to vary; having a larger value of k where the graph is more complex and a smaller value of k where the graph would likely contain fewer spurious edges and vertices. One open problem that affects the practicality of this method is how to predict the value of k prior to building the de Bruijn graph. We show that optimal values of k can be predicted prior to assembly by using the information contained in a phylogenetically-close genome and therefore, help make the use of multiple values of k practical for genome assembly. Results We present HyDA-Vista, which is a genome assembler that uses homology information to choose a value of k for each read prior to the de Bruijn graph construction. The chosen k is optimal if there are no sequencing errors and the coverage is sufficient. Fundamental to our method is the construction of the maximal sequence landscape, which is a data structure that stores for each position in the input string, the largest repeated substring containing that position. In particular, we show the maximal sequence landscape can be constructed in O(n + n log n)-time and O(n)-space. HyDA-Vista first constructs the maximal sequence landscape for a homologous genome. The reads are then aligned to this reference genome, and values of k are assigned to each read using the maximal sequence landscape and the alignments. Eventually, all the reads are assembled by an iterative de Bruijn graph construction method. Our results and comparison to other assemblers demonstrate that HyDA-Vista achieves the best assembly of E. coli before repeat resolution or scaffolding. Availability HyDA-Vista is freely available [1]. The code for constructing the maximal sequence landscape and choosing the optimal value of k for each read is also separately available on the website and could be incorporated into any genome assembler.
Collapse
|
467
|
The complex task of choosing a de novo assembly: Lessons from fungal genomes. Comput Biol Chem 2014; 53 Pt A:97-107. [DOI: 10.1016/j.compbiolchem.2014.08.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 12/21/2022]
|
468
|
Draft Genome Sequence of Taylorella equigenitalis Strain MCE529, Isolated from a Belgian Warmblood Horse. GENOME ANNOUNCEMENTS 2014; 2:2/6/e01214-14. [PMID: 25428969 PMCID: PMC4246161 DOI: 10.1128/genomea.01214-14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Taylorella equigenitalis is the causative agent of contagious equine metritis (CEM), a sexually transmitted infection of horses. We herein report the genome sequence of T. equigenitalis strain MCE529, isolated in 2009 from the urethral fossa of a 15-year-old Belgian Warmblood horse in France.
Collapse
|
469
|
Melsted P, Halldórsson BV. KmerStream: streaming algorithms for k-mer abundance estimation. ACTA ACUST UNITED AC 2014; 30:3541-7. [PMID: 25355787 DOI: 10.1093/bioinformatics/btu713] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Several applications in bioinformatics, such as genome assemblers and error corrections methods, rely on counting and keeping track of k-mers (substrings of length k). Histograms of k-mer frequencies can give valuable insight into the underlying distribution and indicate the error rate and genome size sampled in the sequencing experiment. RESULTS We present KmerStream, a streaming algorithm for estimating the number of distinct k-mers present in high-throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are logarithmic in the size of the input. We derive a simple model that allows us to estimate the error rate of the sequencing experiment, as well as the genome size, using only the aggregate statistics reported by KmerStream. As an application we show how KmerStream can be used to compute the error rate of a DNA sequencing experiment. We run KmerStream on a set of 2656 whole genome sequenced individuals and compare the error rate to quality values reported by the sequencing equipment. We discover that while the quality values alone are largely reliable as a predictor of error rate, there is considerable variability in the error rates between sequencing runs, even when accounting for reported quality values.
Collapse
Affiliation(s)
- Páll Melsted
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland
| | - Bjarni V Halldórsson
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland, deCODE Genetics/Amgen, Reykjavík, Iceland and School of Science and Engineering, Reykjavík University, Reykjavík, Iceland
| |
Collapse
|
470
|
Jünemann S, Prior K, Albersmeier A, Albaum S, Kalinowski J, Goesmann A, Stoye J, Harmsen D. GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers. PLoS One 2014; 9:e107014. [PMID: 25198770 PMCID: PMC4157817 DOI: 10.1371/journal.pone.0107014] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/07/2014] [Indexed: 12/28/2022] Open
Abstract
De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM), popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely available at ftp://ftp.cebitec.uni-bielefeld.de/pub/GABenchToB.
Collapse
Affiliation(s)
- Sebastian Jünemann
- Department for Periodontology, University of Münster, Münster, Germany
- Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Karola Prior
- Department for Periodontology, University of Münster, Münster, Germany
| | - Andreas Albersmeier
- Technology Platform Genomics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Stefan Albaum
- Bioinformatics Resource Facility, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Jörn Kalinowski
- Technology Platform Genomics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Alexander Goesmann
- Bioinformatics and Systems Biology, Justus-Liebig-Univeristy Gießen, Gießen, Germany
| | - Jens Stoye
- Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
- Genome Informatics Group, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Dag Harmsen
- Department for Periodontology, University of Münster, Münster, Germany
| |
Collapse
|
471
|
Zhang Q, Pell J, Canino-Koning R, Howe AC, Brown CT. These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS One 2014; 9:e101271. [PMID: 25062443 PMCID: PMC4111482 DOI: 10.1371/journal.pone.0101271] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 06/04/2014] [Indexed: 11/19/2022] Open
Abstract
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer.
Collapse
Affiliation(s)
- Qingpeng Zhang
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Jason Pell
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Rosangela Canino-Koning
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Adina Chuang Howe
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, United States of America
| | - C. Titus Brown
- Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
- * E-mail:
| |
Collapse
|
472
|
Utturkar SM, Klingeman DM, Land ML, Schadt CW, Doktycz MJ, Pelletier DA, Brown SD. Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences. ACTA ACUST UNITED AC 2014; 30:2709-16. [PMID: 24930142 PMCID: PMC4173024 DOI: 10.1093/bioinformatics/btu391] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. RESULTS Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as an additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. AVAILABILITY AND IMPLEMENTATION All assembly tools except CLC Genomics Workbench are freely available under GNU General Public License. CONTACT brownsd@ornl.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sagar M Utturkar
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Dawn M Klingeman
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Miriam L Land
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Christopher W Schadt
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Mitchel J Doktycz
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Dale A Pelletier
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Steven D Brown
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37919, USA and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
473
|
Garmendia J, Viadas C, Calatayud L, Mell JC, Martí-Lliteras P, Euba B, Llobet E, Gil C, Bengoechea JA, Redfield RJ, Liñares J. Characterization of nontypable Haemophilus influenzae isolates recovered from adult patients with underlying chronic lung disease reveals genotypic and phenotypic traits associated with persistent infection. PLoS One 2014; 9:e97020. [PMID: 24824990 PMCID: PMC4019658 DOI: 10.1371/journal.pone.0097020] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 04/14/2014] [Indexed: 01/09/2023] Open
Abstract
Nontypable Haemophilus influenzae (NTHi) has emerged as an important opportunistic pathogen causing infection in adults suffering obstructive lung diseases. Existing evidence associates chronic infection by NTHi to the progression of the chronic respiratory disease, but specific features of NTHi associated with persistence have not been comprehensively addressed. To provide clues about adaptive strategies adopted by NTHi during persistent infection, we compared sequential persistent isolates with newly acquired isolates in sputa from six patients with chronic obstructive lung disease. Pulse field gel electrophoresis (PFGE) identified three patients with consecutive persistent strains and three with new strains. Phenotypic characterisation included infection of respiratory epithelial cells, bacterial self-aggregation, biofilm formation and resistance to antimicrobial peptides (AMP). Persistent isolates differed from new strains in showing low epithelial adhesion and inability to form biofilms when grown under continuous-flow culture conditions in microfermenters. Self-aggregation clustered the strains by patient, not by persistence. Increasing resistance to AMPs was observed for each series of persistent isolates; this was not associated with lipooligosaccharide decoration with phosphorylcholine or with lipid A acylation. Variation was further analyzed for the series of three persistent isolates recovered from patient 1. These isolates displayed comparable growth rate, natural transformation frequency and murine pulmonary infection. Genome sequencing of these three isolates revealed sequential acquisition of single-nucleotide variants in the AMP permease sapC, the heme acquisition systems hgpB, hgpC, hup and hxuC, the 3-deoxy-D-manno-octulosonic acid kinase kdkA, the long-chain fatty acid transporter ompP1, and the phosphoribosylamine glycine ligase purD. Collectively, we frame a range of pathogenic traits and a repertoire of genetic variants in the context of persistent infection by NTHi.
Collapse
Affiliation(s)
- Junkal Garmendia
- Instituto de Agrobiotecnología, CSIC-Universidad Pública Navarra-Gobierno Navarra, Mutilva, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Madrid, Spain
- Laboratory Microbial Pathogenesis, Fundación Investigación Sanitaria Illes Balears, Bunyola, Spain
- * E-mail:
| | - Cristina Viadas
- Instituto de Agrobiotecnología, CSIC-Universidad Pública Navarra-Gobierno Navarra, Mutilva, Spain
| | - Laura Calatayud
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Madrid, Spain
- Microbiology Department, University Hospital Bellvitge, IDIBELL, University of Barcelona, Barcelona, Spain
| | - Joshua Chang Mell
- Department of Zoology, University British Columbia, Vancouver, British Columbia, Canada
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Pau Martí-Lliteras
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Madrid, Spain
- Laboratory Microbial Pathogenesis, Fundación Investigación Sanitaria Illes Balears, Bunyola, Spain
| | - Begoña Euba
- Instituto de Agrobiotecnología, CSIC-Universidad Pública Navarra-Gobierno Navarra, Mutilva, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Madrid, Spain
| | - Enrique Llobet
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Madrid, Spain
- Laboratory Microbial Pathogenesis, Fundación Investigación Sanitaria Illes Balears, Bunyola, Spain
| | - Carmen Gil
- Instituto de Agrobiotecnología, CSIC-Universidad Pública Navarra-Gobierno Navarra, Mutilva, Spain
| | - José Antonio Bengoechea
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Madrid, Spain
- Laboratory Microbial Pathogenesis, Fundación Investigación Sanitaria Illes Balears, Bunyola, Spain
- Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Rosemary J. Redfield
- Department of Zoology, University British Columbia, Vancouver, British Columbia, Canada
| | - Josefina Liñares
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Madrid, Spain
- Microbiology Department, University Hospital Bellvitge, IDIBELL, University of Barcelona, Barcelona, Spain
| |
Collapse
|
474
|
Trivedi UH, Cézard T, Bridgett S, Montazam A, Nichols J, Blaxter M, Gharbi K. Quality control of next-generation sequencing data without a reference. Front Genet 2014; 5:111. [PMID: 24834071 PMCID: PMC4018527 DOI: 10.3389/fgene.2014.00111] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Accepted: 04/14/2014] [Indexed: 01/07/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have dramatically expanded the breadth of genomics. Genome-scale data, once restricted to a small number of biomedical model organisms, can now be generated for virtually any species at remarkable speed and low cost. Yet non-model organisms often lack a suitable reference to map sequence reads against, making alignment-based quality control (QC) of NGS data more challenging than cases where a well-assembled genome is already available. Here we show that by generating a rapid, non-optimized draft assembly of raw reads, it is possible to obtain reliable and informative QC metrics, thus removing the need for a high quality reference. We use benchmark datasets generated from control samples across a range of genome sizes to illustrate that QC inferences made using draft assemblies are broadly equivalent to those made using a well-established reference, and describe QC tools routinely used in our production facility to assess the quality of NGS data from non-model organisms.
Collapse
Affiliation(s)
- Urmi H Trivedi
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK
| | - Timothée Cézard
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK
| | - Stephen Bridgett
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK
| | - Anna Montazam
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK
| | - Jenna Nichols
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK
| | - Mark Blaxter
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK ; Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh Edinburgh, UK
| | - Karim Gharbi
- Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK ; Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh Edinburgh, UK
| |
Collapse
|
475
|
Koren S, Treangen TJ, Hill CM, Pop M, Phillippy AM. Automated ensemble assembly and validation of microbial genomes. BMC Bioinformatics 2014; 15:126. [PMID: 24884846 PMCID: PMC4030574 DOI: 10.1186/1471-2105-15-126] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 04/24/2014] [Indexed: 11/12/2022] Open
Abstract
Background The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. Results To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Conclusions Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
Collapse
Affiliation(s)
- Sergey Koren
- National Biodefense Analysis and Countermeasures Center, 110 Thomas Johnson Drive, Frederick, MD 21702, USA.
| | | | | | | | | |
Collapse
|
476
|
Heo Y, Wu XL, Chen D, Ma J, Hwu WM. BLESS: bloom filter-based error correction solution for high-throughput sequencing reads. ACTA ACUST UNITED AC 2014; 30:1354-62. [PMID: 24451628 DOI: 10.1093/bioinformatics/btu030] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers. RESULTS We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors. AVAILABILITY AND IMPLEMENTATION Freely available at http://sourceforge.net/p/bless-ec CONTACT dchen@illinois.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Heo
- Department of Electrical and Computer Engineering, Department of Bioengineering and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | | | |
Collapse
|
477
|
Abstract
MOTIVATION The de novo assembly of large, complex genomes is a significant challenge with currently available DNA sequencing technology. While many de novo assembly software packages are available, comparatively little attention has been paid to assisting the user with the assembly. RESULTS This article addresses the practical aspects of de novo assembly by introducing new ways to perform quality assessment on a collection of sequence reads. The software implementation calculates per-base error rates, paired-end fragment-size distributions and coverage metrics in the absence of a reference genome. Additionally, the software will estimate characteristics of the sequenced genome, such as repeat content and heterozygosity that are key determinants of assembly difficulty.
Collapse
|
478
|
MOLECULAR MECHANISM OF THE CAROTENOID BIOSYNTHESIS ACTIVATION IN THE PRODUCER Streptomyces globisporus 1912. BIOTECHNOLOGIA ACTA 2014. [DOI: 10.15407/biotech7.06.069] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
479
|
Anvar SY, Khachatryan L, Vermaat M, van Galen M, Pulyakhina I, Ariyurek Y, Kraaijeveld K, den Dunnen JT, de Knijff P, ’t Hoen PAC, Laros JFJ. Determining the quality and complexity of next-generation sequencing data without a reference genome. Genome Biol 2014; 15:555. [PMID: 25514851 PMCID: PMC4298064 DOI: 10.1186/s13059-014-0555-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 11/27/2014] [Indexed: 01/22/2023] Open
Abstract
We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL webcite.
Collapse
Affiliation(s)
- Seyed Yahya Anvar
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Lusine Khachatryan
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Martijn Vermaat
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Michiel van Galen
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Irina Pulyakhina
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Yavuz Ariyurek
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Ken Kraaijeveld
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
- />Department of Ecological Science, VU University Amsterdam, Amsterdam, The Netherlands
| | - Johan T den Dunnen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
- />Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter de Knijff
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter AC ’t Hoen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Jeroen FJ Laros
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|