1
|
Bossert S, Pauly A, Danforth BN, Orr MC, Murray EA. Lessons from assembling UCEs: A comparison of common methods and the case of Clavinomia (Halictidae). Mol Ecol Resour 2024; 24:e13925. [PMID: 38183389 DOI: 10.1111/1755-0998.13925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 12/08/2023] [Accepted: 12/21/2023] [Indexed: 01/08/2024]
Abstract
Sequence data assembly is a foundational step in high-throughput sequencing, with untold consequences for downstream analyses. Despite this, few studies have interrogated the many methods for assembling phylogenomic UCE data for their comparative efficacy, or for how outputs may be impacted. We study this by comparing the most commonly used assembly methods for UCEs in the under-studied bee lineage Nomiinae and a representative sampling of relatives. Data for 63 UCE-only and 75 mixed taxa were assembled with five methods, including ABySS, HybPiper, SPAdes, Trinity and Velvet, and then benchmarked for their relative performance in terms of locus capture parameters and phylogenetic reconstruction. Unexpectedly, Trinity and Velvet trailed the other methods in terms of locus capture and DNA matrix density, whereas SPAdes performed favourably in most assessed metrics. In comparison with SPAdes, the guided-assembly approach HybPiper generally recovered the highest quality loci but in lower numbers. Based on our results, we formally move Clavinomia to Dieunomiini and render Epinomia once more a subgenus of Dieunomia. We strongly advise that future studies more closely examine the influence of assembly approach on their results, or, minimally, use better-performing assembly methods such as SPAdes or HybPiper. In this way, we can move forward with phylogenomic studies in a more standardized, comparable manner.
Collapse
Affiliation(s)
- Silas Bossert
- Department of Entomology, Washington State University, Pullman, Washington, USA
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Alain Pauly
- Royal Belgian Institute of Natural Sciences, O.D. Taxonomy and Phylogeny, Brussels, Belgium
| | - Bryan N Danforth
- Department of Entomology, Cornell University, Ithaca, New York, USA
| | - Michael C Orr
- Entomologie, Staatliches Museum für Naturkunde Stuttgart, Stuttgart, Germany
| | - Elizabeth A Murray
- Department of Entomology, Washington State University, Pullman, Washington, USA
| |
Collapse
|
2
|
Mollerup S, Worning P, Petersen A, Bartels MD. spa Typing of Methicillin-Resistant Staphylococcus aureus Based on Whole-Genome Sequencing: the Impact of the Assembler. Microbiol Spectr 2022; 10:e0218922. [PMID: 36350148 PMCID: PMC9769676 DOI: 10.1128/spectrum.02189-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 10/13/2022] [Indexed: 11/11/2022] Open
Abstract
Sequencing of the spa gene of methicillin-resistant Staphylococcus aureus (MRSA) is used for assigning spa types to e.g., detect transmission and control outbreaks. Traditionally, spa typing is performed by Sanger sequencing but has in recent years been replaced by whole-genome sequencing (WGS) in some laboratories. Spa typing by WGS involves de novo assembly of millions of short sequencing reads into larger contiguous sequences, from which the spa type is then determined. The choice of assembly program therefore potentially impacts the spa typing result. In this study, WGS of 1,754 MRSA isolates was followed by de novo assembly using the assembly programs SPAdes (with two different sets of parameters) and SKESA. The spa types were assigned and compared to the spa types obtained by Sanger sequencing, regarding the latter as the correct spa types. SPAdes with the two different settings resulted in assembly of the correct spa type for 84.8% and 97.6% of the isolates, respectively, while SKESA assembled the correct spa type in 98.6% of cases. The misassembled spa types were generally two spa repeats shorter than the correct spa type and mainly included spa types with repetition of the same repeats. WGS-based spa typing is thus very accurate compared to Sanger sequencing, when the best assembly program for this purpose is used. IMPORTANCE spa typing of methicillin-resistant Staphylococcus aureus (MRSA) is widely used by clinicians, infection control workers, and researchers both in local outbreak investigations and as an easy way to communicate and compare MRSA types between laboratories and countries. Traditionally, spa types are determined by Sanger sequencing, but in recent years a whole-genome sequencing (WGS)-based approach has become increasingly used. In this study, we compared spa typing by WGS using different methods for assembling the genome from short sequencing reads and compared to Sanger sequencing as the gold standard. We find substantial differences in correct assembly of spa types between the assembly methods. Our findings are therefore important for the quality of WGS based spa typing data being exchanged by clinical microbiology laboratories.
Collapse
Affiliation(s)
- Sarah Mollerup
- Department of Clinical Microbiology, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark
| | - Peder Worning
- Department of Clinical Microbiology, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark
| | - Andreas Petersen
- Department of Bacteria, Parasites & Fungi, Statens Serum Institut, Copenhagen, Denmark
| | - Mette Damkjær Bartels
- Department of Clinical Microbiology, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
3
|
Ishengoma E, Rhode C. Using SPAdes, AUGUSTUS, and BLAST in an Automated Pipeline for Clustering Homologous Exome Sequences. Curr Protoc 2022; 2:e449. [PMID: 35612494 DOI: 10.1002/cpz1.449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Cross-species exome sequencing approaches provide unprecedented avenues for obtaining genetic diversity, evolutionary relationships, and functional information from a variety of organisms including non-model species. These approaches offer cost-effective opportunities to study multiple individuals or species in parallel, but also create bioinformatics challenges in the application of multiple but powerful bioinformatics tools for the identification of homologous gene families across individual or species boundaries. Popular tools of this kind include SPAdes for sequence assembly, AUGUSTUS for ab initio gene prediction, and BLAST for building homologous sequence families. These tools can also be sophisticated in terms of installation and usage. Here, we present detailed steps on how to run these tools for the recovery and clustering of exon sequences from cross-species raw exome-capture data into homologous sequence families. We also present a utility pipeline, CODSEQCP, that automates these steps to cluster exon sequences, facilitating population genomics and evolutionary studies. © 2022 Wiley Periodicals LLC. Basic Protocol 1: Reads assembly using SPAdes Basic Protocol 2: Coding sequence extraction using AUGUSTUS Basic Protocol 3: Sequence clustering using BLAST Alternate Protocol: How to run CODSEQCP.
Collapse
Affiliation(s)
- Edson Ishengoma
- Department of Biological Sciences, Mkwawa University College of Education, University of Dar es Salaam, Tanzania.,Department of Genetics, Stellenbosch University, Stellenbosch, South Africa
| | - Clint Rhode
- Department of Genetics, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
4
|
Abstract
Metagenomics is a segment of conventional microbial genomics dedicated to the sequencing and analysis of combined genomic DNA of entire environmental samples. The most critical step of the metagenomic data analysis is the reconstruction of individual genes and genomes of the microorganisms in the communities using metagenomic assemblers - computational programs that put together small fragments of sequenced DNA generated by sequencing instruments. Here, we describe the challenges of metagenomic assembly, a wide spectrum of applications in which metagenomic assemblies were used to better understand the ecology and evolution of microbial ecosystems, and present one of the most efficient microbial assemblers, SPAdes that was upgraded to become applicable for metagenomics.
Collapse
Affiliation(s)
- Alla L. Lapidus
- Center for Algorithmic Biotechnology, St. Petersburg State University, Saint Petersburg, Russia
| | | |
Collapse
|
5
|
Chen Z, Erickson DL, Meng J. Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing. BMC Genomics 2020; 21:631. [PMID: 32928108 PMCID: PMC7490894 DOI: 10.1186/s12864-020-07041-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 08/31/2020] [Indexed: 02/06/2023] Open
Abstract
Background We benchmarked the hybrid assembly approaches of MaSuRCA, SPAdes, and Unicycler for bacterial pathogens using Illumina and Oxford Nanopore sequencing by determining genome completeness and accuracy, antimicrobial resistance (AMR), virulence potential, multilocus sequence typing (MLST), phylogeny, and pan genome. Ten bacterial species (10 strains) were tested for simulated reads of both mediocre- and low-quality, whereas 11 bacterial species (12 strains) were tested for real reads. Results Unicycler performed the best for achieving contiguous genomes, closely followed by MaSuRCA, while all SPAdes assemblies were incomplete. MaSuRCA was less tolerant of low-quality long reads than SPAdes and Unicycler. The hybrid assemblies of five antimicrobial-resistant strains with simulated reads provided consistent AMR genotypes with the reference genomes. The MaSuRCA assembly of Staphylococcus aureus with real reads contained msr(A) and tet(K), while the reference genome and SPAdes and Unicycler assemblies harbored blaZ. The AMR genotypes of the reference genomes and hybrid assemblies were consistent for the other five antimicrobial-resistant strains with real reads. The numbers of virulence genes in all hybrid assemblies were similar to those of the reference genomes, irrespective of simulated or real reads. Only one exception existed that the reference genome and hybrid assemblies of Pseudomonas aeruginosa with mediocre-quality long reads carried 241 virulence genes, whereas 184 virulence genes were identified in the hybrid assemblies of low-quality long reads. The MaSuRCA assemblies of Escherichia coli O157:H7 and Salmonella Typhimurium with mediocre-quality long reads contained 126 and 118 virulence genes, respectively, while 110 and 107 virulence genes were detected in their MaSuRCA assemblies of low-quality long reads, respectively. All approaches performed well in our MLST and phylogenetic analyses. The pan genomes of the hybrid assemblies of S. Typhimurium with mediocre-quality long reads were similar to that of the reference genome, while SPAdes and Unicycler were more tolerant of low-quality long reads than MaSuRCA for the pan-genome analysis. All approaches functioned well in the pan-genome analysis of Campylobacter jejuni with real reads. Conclusions Our research demonstrates the hybrid assembly pipeline of Unicycler as a superior approach for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.
Collapse
Affiliation(s)
- Zhao Chen
- Joint Institute for Food Safety and Applied Nutrition, Center for Food Safety and Security Systems, and Department of Nutrition and Food Science, University of Maryland, College Park, MD, 20742, USA
| | - David L Erickson
- Joint Institute for Food Safety and Applied Nutrition, Center for Food Safety and Security Systems, and Department of Nutrition and Food Science, University of Maryland, College Park, MD, 20742, USA
| | - Jianghong Meng
- Joint Institute for Food Safety and Applied Nutrition, Center for Food Safety and Security Systems, and Department of Nutrition and Food Science, University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
6
|
Greshake Tzovaras B, Segers FHID, Bicker A, Dal Grande F, Otte J, Anvar SY, Hankeln T, Schmitt I, Ebersberger I. What Is in Umbilicaria pustulata? A Metagenomic Approach to Reconstruct the Holo-Genome of a Lichen. Genome Biol Evol 2020; 12:309-324. [PMID: 32163141 PMCID: PMC7186782 DOI: 10.1093/gbe/evaa049] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/09/2020] [Indexed: 12/29/2022] Open
Abstract
Lichens are valuable models in symbiosis research and promising sources of biosynthetic genes for biotechnological applications. Most lichenized fungi grow slowly, resist aposymbiotic cultivation, and are poor candidates for experimentation. Obtaining contiguous, high-quality genomes for such symbiotic communities is technically challenging. Here, we present the first assembly of a lichen holo-genome from metagenomic whole-genome shotgun data comprising both PacBio long reads and Illumina short reads. The nuclear genomes of the two primary components of the lichen symbiosis-the fungus Umbilicaria pustulata (33 Mb) and the green alga Trebouxia sp. (53 Mb)-were assembled at contiguities comparable to single-species assemblies. The analysis of the read coverage pattern revealed a relative abundance of fungal to algal nuclei of ∼20:1. Gap-free, circular sequences for all organellar genomes were obtained. The bacterial community is dominated by Acidobacteriaceae and encompasses strains closely related to bacteria isolated from other lichens. Gene set analyses showed no evidence of horizontal gene transfer from algae or bacteria into the fungal genome. Our data suggest a lineage-specific loss of a putative gibberellin-20-oxidase in the fungus, a gene fusion in the fungal mitochondrion, and a relocation of an algal chloroplast gene to the algal nucleus. Major technical obstacles during reconstruction of the holo-genome were coverage differences among individual genomes surpassing three orders of magnitude. Moreover, we show that GC-rich inverted repeats paired with nonrandom sequencing error in PacBio data can result in missing gene predictions. This likely poses a general problem for genome assemblies based on long reads.
Collapse
Affiliation(s)
- Bastian Greshake Tzovaras
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- Lawrence Berkeley National Laboratory, Berkeley, California
- Center for Research & Interdisciplinarity, Université de Paris, France
| | - Francisca H I D Segers
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
| | - Anne Bicker
- Institute for Organismic and Molecular Evolution, Molecular Genetics and Genome Analysis, Johannes Gutenberg University Mainz, Germany
| | - Francesco Dal Grande
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Jürgen Otte
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Seyed Yahya Anvar
- Department of Human Genetics, Leiden University Medical Center, The Netherlands
| | - Thomas Hankeln
- Institute for Organismic and Molecular Evolution, Molecular Genetics and Genome Analysis, Johannes Gutenberg University Mainz, Germany
| | - Imke Schmitt
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
- Molecular Evolutionary Biology Group, Institute of Ecology, Diversity, and Evolution, Goethe University Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Germany
- LOEWE Center for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| |
Collapse
|
7
|
Margos G, Fedorova N, Becker NS, Kleinjan JE, Marosevic D, Krebs S, Hui L, Fingerle V, Lane RS. Borrelia maritima sp. nov., a novel species of the Borrelia burgdorferi sensu lato complex, occupying a basal position to North American species. Int J Syst Evol Microbiol 2020; 70:849-856. [PMID: 31793856 DOI: 10.1099/ijsem.0.003833] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Borrelia species are vector-borne parasitic bacteria with unusual, highly fragmented genomes that include a linear chromosome and linear as well as circular plasmids that differ numerically between and within various species. Strain CA690T, which was cultivated from a questing Ixodes spinipalpis nymph in the San Francisco Bay area, CA, was determined to be genetically distinct from all other described species belonging to the Borrelia burgdorferi sensu lato complex. The genome, including plasmids, was assembled using a hybrid assembly of short Illumina reads and long reads obtained via Oxford Nanopore Technology. We found that strain CA690T has a main linear chromosome containing 902176 bp with a blast identity ≤91 % compared with other Borrelia species chromosomes and five linear and two circular plasmids. A phylogeny based on 37 single-copy genes of the main linear chromosome and rooted with the relapsing fever species Borrelia duttonii strain Ly revealed that strain CA690T had a sister-group relationship with, and occupied a basal position to, species occurring in North America. We propose to name this species Borrelia maritima sp. nov. The type strain, CA690T, has been deposited in two national culture collections, DSMZ (=107169) and ATCC (=TSD-160).
Collapse
Affiliation(s)
- Gabriele Margos
- German National Reference Centre for Borrelia, Bavarian Health and Food Safety Authority, Veterinärstr. 2, 85764 Oberschleissheim, Germany
| | - Natalia Fedorova
- Alameda County Vector Control Services District, Alameda, CA, USA
| | - Noémie S Becker
- LMU Munich, Faculty of Biology, Division of Evolutionary Biology, Großhaderner Str. 2, Germany
| | - Joyce E Kleinjan
- Alameda County Vector Control Services District, Alameda, CA, USA
| | - Durdica Marosevic
- German National Reference Centre for Borrelia, Bavarian Health and Food Safety Authority, Veterinärstr. 2, 85764 Oberschleissheim, Germany
| | - Stefan Krebs
- LMU Munich, Gene Centre, Laboratory for Functional Genome Analysis, Feodor-Lynen-Strasse 25, 81377 Munich, LMU, Germany
| | - Lucia Hui
- Alameda County Vector Control Services District, Alameda, CA, USA
| | - Volker Fingerle
- German National Reference Centre for Borrelia, Bavarian Health and Food Safety Authority, Veterinärstr. 2, 85764 Oberschleissheim, Germany
| | - Robert S Lane
- University of California, Department of Environmental Science, Policy and Management, Berkeley, CA, USA
| |
Collapse
|
8
|
Li Y, Zhang C, Zhang M, Li Y, Wang X, Duan Y. The complete chloroplast genome sequence of Keteleeria fortunei (Pinaceae). Mitochondrial DNA B Resour 2019; 4:3157-3158. [PMID: 33365897 PMCID: PMC7706782 DOI: 10.1080/23802359.2019.1667896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 08/16/2019] [Indexed: 10/26/2022] Open
Abstract
Here, the complete chloroplast genome of Keteleeria fortunei, a vulnerable species in China, was sequenced by next-generation sequencing platform. Its circular genome was 117,183 bp in length and the GC content was 38.5%. A total of 101 genes were annotated, including 4 rRNA genes, 20 tRNA genes, and 71 protein coding genes. This study would further our understanding of the genomics and the conservation and utilization of K. fortunei.
Collapse
Affiliation(s)
- Yuanyuan Li
- International Cultivar Registration Center for Osmanthus, College of Biology and the Environment, Nanjing Forestry University, Nanjing, Jiangsu, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Cheng Zhang
- International Cultivar Registration Center for Osmanthus, College of Biology and the Environment, Nanjing Forestry University, Nanjing, Jiangsu, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Min Zhang
- International Cultivar Registration Center for Osmanthus, College of Biology and the Environment, Nanjing Forestry University, Nanjing, Jiangsu, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Yongfu Li
- International Cultivar Registration Center for Osmanthus, College of Biology and the Environment, Nanjing Forestry University, Nanjing, Jiangsu, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Xianrong Wang
- International Cultivar Registration Center for Osmanthus, College of Biology and the Environment, Nanjing Forestry University, Nanjing, Jiangsu, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Yifan Duan
- International Cultivar Registration Center for Osmanthus, College of Biology and the Environment, Nanjing Forestry University, Nanjing, Jiangsu, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
| |
Collapse
|
9
|
Margos G, Becker NS, Fingerle V, Sing A, Ramos JA, Carvalho ILD, Norte AC. Core genome phylogenetic analysis of the avian associated Borrelia turdi indicates a close relationship to Borrelia garinii. Mol Phylogenet Evol 2018; 131:93-98. [PMID: 30423440 DOI: 10.1016/j.ympev.2018.10.044] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 08/28/2018] [Accepted: 10/31/2018] [Indexed: 02/07/2023]
Abstract
Borrelia burgdorferi sensu lato comprises a species complex of tick-transmitted bacteria that includes the agents of human Lyme borreliosis. Borrelia turdi is a genospecies of this complex that exists in cryptic transmission cycles mainly between ornithophilic tick vectors and their avian hosts. The species has been originally discovered in avian transmission cycles in Asia but has increasingly been found in Europe. Next generation sequencing was used to sequence the genome of B. turdi isolates obtained from ticks feeding on birds in Portugal to better understand the evolution and phylogenetic relationship of this avian and ornithophilic tick-associated genospecies. Here we use draft genomes of these B. turdi isolates for comparative analysis and to determine the taxonomic position within the B. burgdorferi s.l. species complex. The main chromosomes showed a maximum similarity of 93% to other Borrelia species whilst most plasmids had lower similarities. All three isolates had nine or 10 plasmids and, interestingly, one plasmid with a novel partitioning protein; this plasmid was termed lp30. Phylogenetic analysis of multilocus sequence typing housekeeping genes and 113 single copy orthologous genes revealed that the isolates clustered according to their classification as B. turdi. In phylogenies generated from these 113 genes the isolates cluster together with other Eurasian genospecies and form a sister clade to the avian associated B. garinii and the rodent associated B. bavariensis. These findings show that Borrelia species maintained in cryptic ecological cycles need to be included to fully understand the complex ecology and evolutionary history of this bacterial species complex.
Collapse
Affiliation(s)
- Gabriele Margos
- Bavarian Health and Food Safety Authority, German National Reference Centre for Borrelia, Veterinärstr. 2, 85764 Oberschleissheim, Germany.
| | - Noémie S Becker
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Großhaderner Str. 2, 82152 Planegg-Martinsried, Germany
| | - Volker Fingerle
- Bavarian Health and Food Safety Authority, German National Reference Centre for Borrelia, Veterinärstr. 2, 85764 Oberschleissheim, Germany
| | - Andreas Sing
- Bavarian Health and Food Safety Authority, German National Reference Centre for Borrelia, Veterinärstr. 2, 85764 Oberschleissheim, Germany
| | - Jaime Albino Ramos
- MARE - Marine and Environmental Sciences Centre, Department of Life Sciences, Largo Marquês de Pombal, Faculty of Sciences and Technology, University of Coimbra, Portugal
| | | | - Ana Claudia Norte
- National Institute of Health Dr. Ricardo Jorge, Infectious Department, Lisbon, Portugal; MARE - Marine and Environmental Sciences Centre, Department of Life Sciences, Largo Marquês de Pombal, Faculty of Sciences and Technology, University of Coimbra, Portugal
| |
Collapse
|
10
|
Berthet N, Descorps-Declère S, Nkili-Meyong AA, Nakouné E, Gessain A, Manuguerra JC, Kazanji M. Improved assembly procedure of viral RNA genomes amplified with Phi29 polymerase from new generation sequencing data. Biol Res 2016; 49:39. [PMID: 27605096 PMCID: PMC5015205 DOI: 10.1186/s40659-016-0099-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 08/25/2016] [Indexed: 11/05/2022] Open
Abstract
Background New sequencing technologies have opened the way to the discovery and the characterization of pathogenic viruses in clinical samples. However, the use of these new methods can require an amplification of viral RNA prior to the sequencing. Among all the available methods, the procedure based on the use of Phi29 polymerase produces a huge amount of amplified DNA. However, its major disadvantage is to generate a large number of chimeric sequences which can affect the assembly step. The pre-process method proposed in this study strongly limits the negative impact of chimeric reads in order to obtain the full-length of viral genomes. Findings Three different assembly softwares (ABySS, Ray and SPAdes) were tested for their ability to correctly assemble the full-length of viral genomes. Although in all cases, our pre-processed method improved genome assembly, only its combination with the use of SPAdes allowed us to obtain the full-length of the viral genomes tested in one contig. Conclusions The proposed pipeline is able to overcome drawbacks due to the generation of chimeric reads during the amplification of viral RNA which considerably improves the assembling of full-length viral genomes. Electronic supplementary material The online version of this article (doi:10.1186/s40659-016-0099-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nicolas Berthet
- Institut Pasteur, Epidemiology and Physiopathology of Oncogenic Viruses, 25 Rue Du Docteur Roux, 75724, Paris, France. .,Centre National de La Recherche Scientifique, UMR 3569, 25 Rue Du Docteur Roux, 75724, Paris, France. .,Département Zoonose Et Maladies Emergentes, Syndromes Cliniques Et Virus Associés, Centre International de Recherches Médicales de Franceville (CIRMF), BP769, Franceville, Gabon.
| | | | - Andriniaina Andy Nkili-Meyong
- Département Zoonose Et Maladies Emergentes, Syndromes Cliniques Et Virus Associés, Centre International de Recherches Médicales de Franceville (CIRMF), BP769, Franceville, Gabon
| | - Emmanuel Nakouné
- Département de Virologie, Institut Pasteur de Bangui, BP 923, Bangui, République Centrafricaine
| | - Antoine Gessain
- Institut Pasteur, Epidemiology and Physiopathology of Oncogenic Viruses, 25 Rue Du Docteur Roux, 75724, Paris, France.,Centre National de La Recherche Scientifique, UMR 3569, 25 Rue Du Docteur Roux, 75724, Paris, France
| | - Jean-Claude Manuguerra
- Unité Environnement Et Risques Infectieux, Institut Pasteur, Cellule D'Intervention Biologique D'Urgence, 25 Rue Du Docteur Roux, 75724, Paris, France
| | - Mirdad Kazanji
- Département de Virologie, Institut Pasteur de Bangui, BP 923, Bangui, République Centrafricaine
| |
Collapse
|