1
|
Álvarez-Narváez S, Harrell TL, Nour I, Mohanty SK, Conrad SJ. Choosing the most suitable NGS technology to combine with a standardized viral enrichment protocol for obtaining complete avian orthoreovirus genomes from metagenomic samples. FRONTIERS IN BIOINFORMATICS 2025; 5:1498921. [PMID: 39967836 PMCID: PMC11833334 DOI: 10.3389/fbinf.2025.1498921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 01/13/2025] [Indexed: 02/20/2025] Open
Abstract
Since viruses are obligate intracellular pathogens, sequencing their genomes results in metagenomic data from both the virus and the host. Virology researchers are constantly seeking new, cost-effective strategies and bioinformatic pipelines for the retrieval of complete viral genomes from these metagenomic samples. Avian orthoreoviruses (ARVs) pose a significant and growing threat to the poultry industry and frequently cause economic losses associated with disease in production birds. Currently available commercial vaccines are ineffective against new ARV variants and ARV outbreaks are increasing worldwide, requiring whole genome sequencing (WGS) to characterize strains that evade vaccines. This study compares the effectiveness of long-read and short-read sequencing technologies for obtaining ARV complete genomes. We used eight clinical isolates of ARV, each previously processed using our published viral genome enrichment protocol. Additionally, we evaluate three assembly methods to determine which provided the most complete and reliable whole genomes: De novo, reference-guided or hybrid. The results suggest that our ARV genome enrichment protocol caused some fragmentation of the viral cDNA that impacted the length of the long reads (but not the short reads) and, as a result, caused a failure to produce complete genomes via de novo assembly. Overall, we observed that regardless of the sequencing technology, the best quality assemblies were generated by mapping quality-trimmed reads to a custom reference genome. The custom reference genomes were in turn constructed with the publicly available ARV genomic segments that shared the highest sequence similarity with the contigs from short-read de novo assemblies. Hence, we conclude that short-read sequencing is the most suitable technology to combine with our ARV genome enrichment protocol.
Collapse
Affiliation(s)
- Sonsiray Álvarez-Narváez
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA, United States
| | - Telvin L. Harrell
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
| | - Islam Nour
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
| | - Sujit K. Mohanty
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
| | - Steven J. Conrad
- US National Poultry Research Center, United States Department of Agriculture, Agricultural Research Service, Athens, GA, United States
| |
Collapse
|
2
|
Arroyo Mühr LS, Lagheden C, Hassan SS, Eklund C, Dillner J. The International Human Papillomavirus Reference Center: Standardization, collaboration, and quality assurance in HPV research and diagnostics. J Med Virol 2023; 95:e29332. [PMID: 38115556 DOI: 10.1002/jmv.29332] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/07/2023] [Accepted: 12/08/2023] [Indexed: 12/21/2023]
Abstract
The International Human Papillomavirus (HPV) Reference Center (IHRC) confirms and assigns type numbers to novel HPV types, maintains a reference clone repository, and issues international proficiency panels for HPV screening and genotyping. Furthermore, the Center coordinates the Global HPV Reference Laboratory Network that promotes collaboration and international exchange of experiences among national HPV reference laboratories, to further international standardization and quality assurance in the HPV field. The established HPV types (n = 225) belong to 5 different genera: alpha (n = 65), beta (n = 54), gamma (n = 102), mu (n = 3) and nu (n = 1). Since the last published IHRC overview in 2018, 6 novel types have been established, with 5/6 belonging to the gamma genus and 1/6 to beta genus. Also, 474 reference clones have been provided to 55 different research laboratories and the global proficiency program for HPV genotyping has seen an increasing proficiency (despite a decrease seen in 2019), from 68% proficiency in 2017 to 77.3% in 2022. The first proficiency study for HPV screening found an international proficiency of up to 77%. In summary, increasing complexity of the HPVs and demands on quality assurance in the era of cervical cancer elimination requires international efforts to support proficiency and recognized quality and order among HPV types.
Collapse
Affiliation(s)
- Laila Sara Arroyo Mühr
- Department of Clinical Science, Intervention and Technology (CLINTEC), Center for Cervical Cancer Elimination, Karolinska Institutet, Stockholm, Sweden
| | - Camilla Lagheden
- Department of Clinical Science, Intervention and Technology (CLINTEC), Center for Cervical Cancer Elimination, Karolinska Institutet, Stockholm, Sweden
| | - Sadaf S Hassan
- Department of Clinical Science, Intervention and Technology (CLINTEC), Center for Cervical Cancer Elimination, Karolinska Institutet, Stockholm, Sweden
| | - Carina Eklund
- Department of Clinical Science, Intervention and Technology (CLINTEC), Center for Cervical Cancer Elimination, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Department of Clinical Science, Intervention and Technology (CLINTEC), Center for Cervical Cancer Elimination, Karolinska Institutet, Stockholm, Sweden
- Center for Cervical Cancer Elimination, Karolinska University Hospital Huddinge, Stockholm, Sweden
| |
Collapse
|
3
|
Lu N, Qiao Y, An P, Luo J, Bi C, Li M, Lu Z, Tu J. Exploration of whole genome amplification generated chimeric sequences in long-read sequencing data. Brief Bioinform 2023; 24:bbad275. [PMID: 37529913 DOI: 10.1093/bib/bbad275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/21/2023] [Accepted: 07/10/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION Multiple displacement amplification (MDA) has become the most commonly used method of whole genome amplification, generating a vast amount of DNA with higher molecular weight and greater genome coverage. Coupling with long-read sequencing, it is possible to sequence the amplicons of over 20 kb in length. However, the formation of chimeric sequences (chimeras, expressed as structural errors in sequencing data) in MDA seriously interferes with the bioinformatics analysis but its influence on long-read sequencing data is unknown. RESULTS We sequenced the phi29 DNA polymerase-mediated MDA amplicons on the PacBio platform and analyzed chimeras within the generated data. The 3rd-ChimeraMiner has been constructed as a pipeline for recognizing and restoring chimeras into the original structures in long-read sequencing data, improving the efficiency of using TGS data. Five long-read datasets and one high-fidelity long-read dataset with various amplification folds were analyzed. The result reveals that the mis-priming events in amplification are more frequently occurring than widely perceived, and the propor tion gradually accumulates from 42% to over 78% as the amplification continues. In total, 99.92% of recognized chimeric sequences were demonstrated to be artifacts, whose structures were wrongly formed in MDA instead of existing in original genomes. By restoring chimeras to their original structures, the vast majority of supplementary alignments that introduce false-positive structural variants are recycled, removing 97% of inversions on average and contributing to the analysis of structural variation in MDA-amplified samples. The impact of chimeras in long-read sequencing data analysis should be emphasized, and the 3rd-ChimeraMiner can help to quantify and reduce the influence of chimeras. AVAILABILITY AND IMPLEMENTATION The 3rd-ChimeraMiner is available on GitHub, https://github.com/dulunar/3rdChimeraMiner.
Collapse
Affiliation(s)
- Na Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Yi Qiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Pengfei An
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
- Monash University-Southeast University Joint Research Institute, Suzhou 215123, China
| | - Jiajian Luo
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Changwei Bi
- College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China
| | - Musheng Li
- Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine, Reno, NV 89511, USA
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Jing Tu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
4
|
Ritsch M, Cassman NA, Saghaei S, Marz M. Navigating the Landscape: A Comprehensive Review of Current Virus Databases. Viruses 2023; 15:1834. [PMID: 37766241 PMCID: PMC10537806 DOI: 10.3390/v15091834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/18/2023] [Accepted: 08/21/2023] [Indexed: 09/29/2023] Open
Abstract
Viruses are abundant and diverse entities that have important roles in public health, ecology, and agriculture. The identification and surveillance of viruses rely on an understanding of their genome organization, sequences, and replication strategy. Despite technological advancements in sequencing methods, our current understanding of virus diversity remains incomplete, highlighting the need to explore undiscovered viruses. Virus databases play a crucial role in providing access to sequences, annotations and other metadata, and analysis tools for studying viruses. However, there has not been a comprehensive review of virus databases in the last five years. This study aimed to fill this gap by identifying 24 active virus databases and included an extensive evaluation of their content, functionality and compliance with the FAIR principles. In this study, we thoroughly assessed the search capabilities of five database catalogs, which serve as comprehensive repositories housing a diverse array of databases and offering essential metadata. Moreover, we conducted a comprehensive review of different types of errors, encompassing taxonomy, names, missing information, sequences, sequence orientation, and chimeric sequences, with the intention of empowering users to effectively tackle these challenges. We expect this review to aid users in selecting suitable virus databases and other resources, and to help databases in error management and improve their adherence to the FAIR principles. The databases listed here represent the current knowledge of viruses and will help aid users find databases of interest based on content, functionality, and scope. The use of virus databases is integral to gaining new insights into the biology, evolution, and transmission of viruses, and developing new strategies to manage virus outbreaks and preserve global health.
Collapse
Affiliation(s)
- Muriel Ritsch
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- European Virus Bioinformatics Center, 07743 Jena, Germany
| | - Noriko A. Cassman
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- European Virus Bioinformatics Center, 07743 Jena, Germany
| | - Shahram Saghaei
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- European Virus Bioinformatics Center, 07743 Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany;
- European Virus Bioinformatics Center, 07743 Jena, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany
- FLI Leibniz Institute for Age Research, 07745 Jena, Germany
| |
Collapse
|
5
|
Lee YC, Ke HM, Liu YC, Lee HH, Wang MC, Tseng YC, Kikuchi T, Tsai IJ. Single-worm long-read sequencing reveals genome diversity in free-living nematodes. Nucleic Acids Res 2023; 51:8035-8047. [PMID: 37526286 PMCID: PMC10450198 DOI: 10.1093/nar/gkad647] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/02/2023] Open
Abstract
Obtaining sufficient genetic material from a limited biological source is currently the primary operational bottleneck in studies investigating biodiversity and genome evolution. In this study, we employed multiple displacement amplification (MDA) and Smartseq2 to amplify nanograms of genomic DNA and mRNA, respectively, from individual Caenorhabditis elegans. Although reduced genome coverage was observed in repetitive regions, we produced assemblies covering 98% of the reference genome using long-read sequences generated with Oxford Nanopore Technologies (ONT). Annotation with the sequenced transcriptome coupled with the available assembly revealed that gene predictions were more accurate, complete and contained far fewer false positives than de novo transcriptome assembly approaches. We sampled and sequenced the genomes and transcriptomes of 13 nematodes from early-branching species in Chromadoria, Dorylaimia and Enoplia. The basal Chromadoria and Enoplia species had larger genome sizes, ranging from 136.6 to 738.8 Mb, compared with those in the other clades. Nine mitogenomes were fully assembled, and displayed a complete lack of synteny to other species. Phylogenomic analyses based on the new annotations revealed strong support for Enoplia as sister to the rest of Nematoda. Our result demonstrates the robustness of MDA in combination with ONT, paving the way for the study of genome diversity in the phylum Nematoda and beyond.
Collapse
Affiliation(s)
- Yi-Chien Lee
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
- Biodiversity Program, Taiwan International Graduate Program, Academia Sinica and National Taiwan Normal University, Taipei, Taiwan
- Department of Life Science, National Taiwan Normal University, 116 Wenshan, Taipei, Taiwan
| | - Huei-Mien Ke
- Department of Microbiology, Soochow University, Taipei, Taiwan
| | - Yu-Ching Liu
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Hsin-Han Lee
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Min-Chen Wang
- Marine Research Station (MRS), Institute of Cellular and Organismic Biology, Academia Sinica, 262 I-Lan County, Taiwan
| | - Yung-Che Tseng
- Marine Research Station (MRS), Institute of Cellular and Organismic Biology, Academia Sinica, 262 I-Lan County, Taiwan
| | - Taisei Kikuchi
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan
| | - Isheng Jason Tsai
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
- Biodiversity Program, Taiwan International Graduate Program, Academia Sinica and National Taiwan Normal University, Taipei, Taiwan
| |
Collapse
|
6
|
Lu N, Qiao Y, Lu Z, Tu J. Chimera: The spoiler in multiple displacement amplification. Comput Struct Biotechnol J 2023; 21:1688-1696. [PMID: 36879882 PMCID: PMC9984789 DOI: 10.1016/j.csbj.2023.02.034] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 02/18/2023] [Accepted: 02/18/2023] [Indexed: 02/24/2023] Open
Abstract
Multiple displacement amplification (MDA) based on isothermal random priming and high fidelity phi29 DNA polymerase-mediated processive extension has revolutionized the field of whole genome amplification by enabling the amplification of minute amounts of DNA, such as from a single cell, generating vast amounts of DNA with high genome coverage. Despite its advantages, MDA has its own challenges, one of the grandest being the formation of chimeric sequences (chimeras), which presents in all MDA products and seriously disturbs the downstream analysis. In this review, we provide a comprehensive overview of current research on MDA chimeras. We first reviewed the mechanisms of chimera formation and chimera detection methods. We then systematically summarized the characteristics of chimeras, including overlap, chimeric distance, chimeric density, and chimeric rate, as found in independently published sequencing data. Finally, we reviewed the methods used to process chimeric sequences and their impacts on the improvement of data utilization efficiency. The information presented in this review will be useful for those interested in understanding the challenges with MDA and in improving its performance.
Collapse
Affiliation(s)
- Na Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Yi Qiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Jing Tu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| |
Collapse
|
7
|
Bassi C, Guerriero P, Pierantoni M, Callegari E, Sabbioni S. Novel Virus Identification through Metagenomics: A Systematic Review. LIFE (BASEL, SWITZERLAND) 2022; 12:life12122048. [PMID: 36556413 PMCID: PMC9784588 DOI: 10.3390/life12122048] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 11/25/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022]
Abstract
Metagenomic Next Generation Sequencing (mNGS) allows the evaluation of complex microbial communities, avoiding isolation and cultivation of each microbial species, and does not require prior knowledge of the microbial sequences present in the sample. Applications of mNGS include virome characterization, new virus discovery and full-length viral genome reconstruction, either from virus preparations enriched in culture or directly from clinical and environmental specimens. Here, we systematically reviewed studies that describe novel virus identification through mNGS from samples of different origin (plant, animal and environment). Without imposing time limits to the search, 379 publications were identified that met the search parameters. Sample types, geographical origin, enrichment and nucleic acid extraction methods, sequencing platforms, bioinformatic analytical steps and identified viral families were described. The review highlights mNGS as a feasible method for novel virus discovery from samples of different origins, describes which kind of heterogeneous experimental and analytical protocols are currently used and provides useful information such as the different commercial kits used for the purification of nucleic acids and bioinformatics analytical pipelines.
Collapse
Affiliation(s)
- Cristian Bassi
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
- Laboratorio per Le Tecnologie delle Terapie Avanzate (LTTA), University of Ferrara, 44121 Ferrara, Italy
| | - Paola Guerriero
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
- Laboratorio per Le Tecnologie delle Terapie Avanzate (LTTA), University of Ferrara, 44121 Ferrara, Italy
| | - Marina Pierantoni
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
| | - Elisa Callegari
- Department of Translational Medicine, University of Ferrara, 44121 Ferrara, Italy
| | - Silvia Sabbioni
- Laboratorio per Le Tecnologie delle Terapie Avanzate (LTTA), University of Ferrara, 44121 Ferrara, Italy
- Department of Life Science and Biotechnology, University of Ferrara, 44121 Ferrara, Italy
- Correspondence: ; Tel.: +39-053-245-5319
| |
Collapse
|
8
|
Slizovskiy IB, Oliva M, Settle JK, Zyskina LV, Prosperi M, Boucher C, Noyes NR. Target-enriched long-read sequencing (TELSeq) contextualizes antimicrobial resistance genes in metagenomes. MICROBIOME 2022; 10:185. [PMID: 36324140 PMCID: PMC9628182 DOI: 10.1186/s40168-022-01368-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 09/02/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Metagenomic data can be used to profile high-importance genes within microbiomes. However, current metagenomic workflows produce data that suffer from low sensitivity and an inability to accurately reconstruct partial or full genomes, particularly those in low abundance. These limitations preclude colocalization analysis, i.e., characterizing the genomic context of genes and functions within a metagenomic sample. Genomic context is especially crucial for functions associated with horizontal gene transfer (HGT) via mobile genetic elements (MGEs), for example antimicrobial resistance (AMR). To overcome this current limitation of metagenomics, we present a method for comprehensive and accurate reconstruction of antimicrobial resistance genes (ARGs) and MGEs from metagenomic DNA, termed target-enriched long-read sequencing (TELSeq). RESULTS Using technical replicates of diverse sample types, we compared TELSeq performance to that of non-enriched PacBio and short-read Illumina sequencing. TELSeq achieved much higher ARG recovery (>1,000-fold) and sensitivity than the other methods across diverse metagenomes, revealing an extensive resistome profile comprising many low-abundance ARGs, including some with public health importance. Using the long reads generated by TELSeq, we identified numerous MGEs and cargo genes flanking the low-abundance ARGs, indicating that these ARGs could be transferred across bacterial taxa via HGT. CONCLUSIONS TELSeq can provide a nuanced view of the genomic context of microbial resistomes and thus has wide-ranging applications in public, animal, and human health, as well as environmental surveillance and monitoring of AMR. Thus, this technique represents a fundamental advancement for microbiome research and application. Video abstract.
Collapse
Affiliation(s)
- Ilya B Slizovskiy
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA
| | - Marco Oliva
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Jonathen K Settle
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Lidiya V Zyskina
- Program in Human-Computer Interaction, College of Information Studies, University of Maryland, College Park, MD, USA
| | - Mattia Prosperi
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Noelle R Noyes
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA.
| |
Collapse
|
9
|
Neri U, Wolf YI, Roux S, Camargo AP, Lee B, Kazlauskas D, Chen IM, Ivanova N, Zeigler Allen L, Paez-Espino D, Bryant DA, Bhaya D, Krupovic M, Dolja VV, Kyrpides NC, Koonin EV, Gophna U. Expansion of the global RNA virome reveals diverse clades of bacteriophages. Cell 2022; 185:4023-4037.e18. [PMID: 36174579 DOI: 10.1016/j.cell.2022.08.023] [Citation(s) in RCA: 132] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 05/16/2022] [Accepted: 08/24/2022] [Indexed: 01/26/2023]
Abstract
High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.
Collapse
Affiliation(s)
- Uri Neri
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 6997801, Israel.
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Simon Roux
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Antonio Pedro Camargo
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Benjamin Lee
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Darius Kazlauskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius 10257, Lithuania
| | - I Min Chen
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Natalia Ivanova
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lisa Zeigler Allen
- Microbial and Environmental Genomics, J. Craig Venter Institute, La Jolla, CA, USA; Marine Biology Research Division, Scripps Institution of Oceanography, La Jolla, CA, USA
| | - David Paez-Espino
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Donald A Bryant
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Devaki Bhaya
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Archaeal Virology Unit, 75015 Paris, France
| | - Valerian V Dolja
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA.
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Uri Gophna
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 6997801, Israel.
| |
Collapse
|
10
|
Avershina E, Frye SA, Ali J, Taxt AM, Ahmad R. Ultrafast and Cost-Effective Pathogen Identification and Resistance Gene Detection in a Clinical Setting Using Nanopore Flongle Sequencing. Front Microbiol 2022; 13:822402. [PMID: 35369431 PMCID: PMC8970966 DOI: 10.3389/fmicb.2022.822402] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 01/27/2022] [Indexed: 02/04/2023] Open
Abstract
Rapid bacterial identification and antimicrobial resistance gene (ARG) detection are crucial for fast optimization of antibiotic treatment, especially for septic patients where each hour of delayed antibiotic prescription might have lethal consequences. This work investigates whether the Oxford Nanopore Technology’s (ONT) Flongle sequencing platform is suitable for real-time sequencing directly from blood cultures to identify bacteria and detect resistance-encoding genes. For the analysis, we used pure bacterial cultures of four clinical isolates of Escherichia coli and Klebsiella pneumoniae and two blood samples spiked with either E. coli or K. pneumoniae that had been cultured overnight. We sequenced both the whole genome and plasmids isolated from these bacteria using two different sequencing kits. Generally, Flongle data allow rapid bacterial ID and resistome detection based on the first 1,000–3,000 generated sequences (10 min to 3 h from the sequencing start), albeit ARG variant identification did not always correspond to ONT MinION and Illumina sequencing-based data. Flongle data are sufficient for 99.9% genome coverage within at most 20,000 (clinical isolates) or 50,000 (positive blood cultures) sequences generated. The SQK-LSK110 Ligation kit resulted in higher genome coverage and more accurate bacterial identification than the SQK-RBK004 Rapid Barcode kit.
Collapse
Affiliation(s)
- Ekaterina Avershina
- Department of Biotechnology, Inland Norway University of Applied Sciences, Hamar, Norway
| | - Stephan A Frye
- Division of Laboratory Medicine, Department of Microbiology, Oslo University Hospital, Oslo, Norway
| | - Jawad Ali
- Department of Biotechnology, Inland Norway University of Applied Sciences, Hamar, Norway
| | - Arne M Taxt
- Division of Laboratory Medicine, Department of Microbiology, Oslo University Hospital, Oslo, Norway
| | - Rafi Ahmad
- Department of Biotechnology, Inland Norway University of Applied Sciences, Hamar, Norway.,Faculty of Health Sciences, Institute of Clinical Medicine, UiT - The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
11
|
CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure. PLoS Comput Biol 2021; 17:e1009631. [PMID: 34813594 PMCID: PMC8651127 DOI: 10.1371/journal.pcbi.1009631] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 12/07/2021] [Accepted: 11/11/2021] [Indexed: 11/19/2022] Open
Abstract
With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/. Within transcriptome reference sets, non-chimeric sequences are representations of transcribed genes, while artificially generated chimeric ones are mosaics of two or more pieces of DNA incorrectly pieced together. One area where such sets are utilized is in the quantification of gene expression patterns; where RNA-Seq reads are mapped to the sequences within, and subsequent count values reflect expression levels. Artificial chimeras can have a negative impact on count values by erroneously increasing variation in relation to the reads being mapped. Reference sets can be created from de novo assembled contigs, but chimeras can be introduced during the assembly process via the required traversal of graphs, representing gene families, constructed from the RNA-Seq data. Graph complexity determines how likely chimeras will arise. We have created CStone, a de novo assembler that utilizes a classification system to describe such complexity. Contigs created by CStone are labelled in a manner that indicates whether or not they are non-chimeric. This encourages contig dependent results to be presented with increased objectivity by maintaining the context of ambiguity associated with the assembly process. CStone has been tested extensively. Additionally, we have quantified the relationship between chimeras within reference sets and the identification of differentially expressed genes.
Collapse
|
12
|
Dovrolis N, Kassela K, Konstantinidis K, Kouvela A, Veletza S, Karakasiliotis I. ZWA: Viral genome assembly and characterization hindrances from virus-host chimeric reads; a refining approach. PLoS Comput Biol 2021; 17:e1009304. [PMID: 34370725 PMCID: PMC8376068 DOI: 10.1371/journal.pcbi.1009304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 08/19/2021] [Accepted: 07/24/2021] [Indexed: 11/19/2022] Open
Abstract
Viral metagenomics, also known as virome studies, have yielded an unprecedented number of novel sequences, essential in recognizing and characterizing the etiological agent and the origin of emerging infectious diseases. Several tools and pipelines have been developed, to date, for the identification and assembly of viral genomes. Assembly pipelines often result in viral genomes contaminated with host genetic material, some of which are currently deposited into public databases. In the current report, we present a group of deposited sequences that encompass ribosomal RNA (rRNA) contamination. We highlight the detrimental role of chimeric next generation sequencing reads, between host rRNA sequences and viral sequences, in virus genome assembly and we present the hindrances these reads may pose to current methodologies. We have further developed a refining pipeline, the Zero Waste Algorithm (ZWA) that assists in the assembly of low abundance viral genomes. ZWA performs context-depended trimming of chimeric reads, precisely removing their rRNA moiety. These, otherwise discarded, reads were fed to the assembly pipeline and assisted in the construction of larger and cleaner contigs making a substantial impact on current assembly methodologies. ZWA pipeline may significantly enhance virus genome assembly from low abundance samples and virus metagenomics approaches in which a small number of reads determine genome quality and integrity. For years now the study of viruses and their genetic composition has been important in their identification and classification. Especially in these times of the pandemic turmoil, accurate knowledge of a virus’ exact genetic composition can help identify its strengths and weaknesses allowing us to track its evolution and assist in the development of vaccines and antiviral agents. The reconstruction of these genomic sequences is called the assembly process, a bioinformatics approach which can be complicated and full of pitfalls. This work identifies one such issue, concerning artifacts introduced in viral genomes from the new technologies of nucleic acid sequencing. The proposed algorithm helps alleviate this problem by tentatively removing these problematic regions while keeping the vast majority of the genetic information required to produce a more complete viral genome. This work is anticipated to assist in the submission of higher integrity and accuracy viral genomes in public databases used for novel virus identification and characterization.
Collapse
Affiliation(s)
- Nikolas Dovrolis
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
- * E-mail: (ND); (IK)
| | - Katerina Kassela
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
| | | | - Adamantia Kouvela
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
| | - Stavroula Veletza
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
| | - Ioannis Karakasiliotis
- Laboratory of Biology, Department of Medicine, Democritus University of Thrace, Alexandroupolis, Greece
- * E-mail: (ND); (IK)
| |
Collapse
|
13
|
Human Papillomavirus Detection by Whole-Genome Next-Generation Sequencing: Importance of Validation and Quality Assurance Procedures. Viruses 2021; 13:v13071323. [PMID: 34372528 PMCID: PMC8310033 DOI: 10.3390/v13071323] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/04/2021] [Accepted: 06/18/2021] [Indexed: 12/27/2022] Open
Abstract
Next-generation sequencing (NGS) yields powerful opportunities for studying human papillomavirus (HPV) genomics for applications in epidemiology, public health, and clinical diagnostics. HPV genotypes, variants, and point mutations can be investigated in clinical materials and described in previously unprecedented detail. However, both the NGS laboratory analysis and bioinformatical approach require numerous steps and checks to ensure robust interpretation of results. Here, we provide a step-by-step review of recommendations for validation and quality assurance procedures of each step in the typical NGS workflow, with a focus on whole-genome sequencing approaches. The use of directed pilots and protocols to ensure optimization of sequencing data yield, followed by curated bioinformatical procedures, is particularly emphasized. Finally, the storage and sharing of data sets are discussed. The development of international standards for quality assurance should be a goal for the HPV NGS community, similar to what has been developed for other areas of sequencing efforts including microbiology and molecular pathology. We thus propose that it is time for NGS to be included in the global efforts on quality assurance and improvement of HPV-based testing and diagnostics.
Collapse
|
14
|
Misclassifications in human papillomavirus databases. Virology 2021; 558:57-66. [PMID: 33730650 DOI: 10.1016/j.virol.2021.03.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 02/23/2021] [Accepted: 03/04/2021] [Indexed: 01/05/2023]
Abstract
We assessed the quality of human papillomavirus (HPV) sequences in GenBank by analyzing the possible presence of chimeras, "wrong-assembled" contigs and errors in taxonomy using an open-source script (HPVChimera_Gb) that compared 25 638 HPV-related nucleotide sequences in GenBank with the 221 numbered HPV types and another 220 complete HPV sequences. There were 110 sequences with taxonomy/naming errors (sequences reported as another HPV type than the one they corresponded to) and 1318 possibly chimeric sequences. Manual analysis found plausible explanations for most of them (e.g. sequence covering an integration site) but 114 sequences appeared to be chimeras (96/114 were already flagged as "unverified" by GenBank) and 13 had taxonomy/naming errors. When comparing all correct HPV sequences in GenBank, there appeared to exist about 800 unique putative HPV types. Systematic and regular work towards eliminating chimeric sequences and taxonomy/naming errors could increase the quality and order in HPV research.
Collapse
|