1
|
Song H, Tithi SS, Brown C, Aylward FO, Jensen R, Zhang L. Virseqimprover: an integrated pipeline for viral contig error correction, extension, and annotation. PeerJ 2025; 13:e18515. [PMID: 39807156 PMCID: PMC11727651 DOI: 10.7717/peerj.18515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 10/21/2024] [Indexed: 01/16/2025] Open
Abstract
Despite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated from de novo assembly programs are highly fragmented, presenting significant challenges to downstream analysis and inference. To address this issue, we have developed Virseqimprover, a computational pipeline that can extend assembled contigs to complete or nearly complete genomes while maintaining extension quality. Virseqimprover first examines whether there is any chimeric sequence based on read coverage, breaks the sequence into segments if there is, then extends the longest segment with uniform depth of coverage, and repeats these procedures until the sequence cannot be extended. Finally, Virseqimprover annotates the gene content of the resulting sequence. Results show that Virseqimprover has good performances on correcting and extending viral contigs to their full lengths, hence can be a useful tool to improve the completeness and minimize the assembly errors of viral contigs. Both a web server and a conda package for Virseqimprover are provided to the research community free of charge.
Collapse
Affiliation(s)
- Haoqiu Song
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Saima Sultana Tithi
- Department of Cell & Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, United States of America
| | - Connor Brown
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Frank O Aylward
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Roderick Jensen
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Liqing Zhang
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| |
Collapse
|
2
|
Ejaz MR, Badr K, Hassan ZU, Al-Thani R, Jaoua S. Metagenomic approaches and opportunities in arid soil research. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176173. [PMID: 39260494 DOI: 10.1016/j.scitotenv.2024.176173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 09/04/2024] [Accepted: 09/07/2024] [Indexed: 09/13/2024]
Abstract
Arid soils present unique challenges and opportunities for studying microbial diversity and bioactive potential due to the extreme environmental conditions they bear. This review article investigates soil metagenomics as an emerging tool to explore complex microbial dynamics and unexplored bioactive potential in harsh environments. Utilizing advanced metagenomic techniques, diverse microbial populations that grow under extreme conditions such as high temperatures, salinity, high pH levels, and exposure to metals and radiation can be studied. The use of extremophiles to discover novel natural products and biocatalysts emphasizes the role of functional metagenomics in identifying enzymes and secondary metabolites for industrial and pharmaceutical purposes. Metagenomic sequencing uncovers a complex network of microbial diversity, offering significant potential for discovering new bioactive compounds. Functional metagenomics, connecting taxonomic diversity to genetic capabilities, provides a pathway to identify microbes' mechanisms to synthesize valuable secondary metabolites and other bioactive substances. Contrary to the common perception of desert soil as barren land, the metagenomic analysis reveals a rich diversity of life forms adept at extreme survival. It provides valuable findings into their resilience and potential applications in biotechnology. Moreover, the challenges associated with metagenomics in arid soils, such as low microbial biomass, high DNA degradation rates, and DNA extraction inhibitors and strategies to overcome these issues, outline the latest advancements in extraction methods, high-throughput sequencing, and bioinformatics. The importance of metagenomics for investigating diverse environments opens the way for future research to develop sustainable solutions in agriculture, industry, and medicine. Extensive studies are necessary to utilize the full potential of these powerful microbial communities. This research will significantly improve our understanding of microbial ecology and biotechnology in arid environments.
Collapse
Affiliation(s)
- Muhammad Riaz Ejaz
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Kareem Badr
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Zahoor Ul Hassan
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Roda Al-Thani
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Samir Jaoua
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar.
| |
Collapse
|
3
|
Liu X, Liu Y, Liu J, Zhang H, Shan C, Guo Y, Gong X, Cui M, Li X, Tang M. Correlation between the gut microbiome and neurodegenerative diseases: a review of metagenomics evidence. Neural Regen Res 2024; 19:833-845. [PMID: 37843219 PMCID: PMC10664138 DOI: 10.4103/1673-5374.382223] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/19/2023] [Accepted: 06/17/2023] [Indexed: 10/17/2023] Open
Abstract
A growing body of evidence suggests that the gut microbiota contributes to the development of neurodegenerative diseases via the microbiota-gut-brain axis. As a contributing factor, microbiota dysbiosis always occurs in pathological changes of neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease, and amyotrophic lateral sclerosis. High-throughput sequencing technology has helped to reveal that the bidirectional communication between the central nervous system and the enteric nervous system is facilitated by the microbiota's diverse microorganisms, and for both neuroimmune and neuroendocrine systems. Here, we summarize the bioinformatics analysis and wet-biology validation for the gut metagenomics in neurodegenerative diseases, with an emphasis on multi-omics studies and the gut virome. The pathogen-associated signaling biomarkers for identifying brain disorders and potential therapeutic targets are also elucidated. Finally, we discuss the role of diet, prebiotics, probiotics, postbiotics and exercise interventions in remodeling the microbiome and reducing the symptoms of neurodegenerative diseases.
Collapse
Affiliation(s)
- Xiaoyan Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yi Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
- Institute of Animal Husbandry, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu Province, China
| | - Junlin Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Hantao Zhang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Chaofan Shan
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yinglu Guo
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Xun Gong
- Department of Rheumatology & Immunology, Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Mengmeng Cui
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Xiubin Li
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| |
Collapse
|
4
|
Buzzanca D, Kerkhof PJ, Alessandria V, Rantsiou K, Houf K. Arcobacteraceae comparative genome analysis demonstrates genome heterogeneity and reduction in species isolated from animals and associated with human illness. Heliyon 2023; 9:e17652. [PMID: 37449094 PMCID: PMC10336517 DOI: 10.1016/j.heliyon.2023.e17652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/30/2023] [Accepted: 06/25/2023] [Indexed: 07/18/2023] Open
Abstract
The Arcobacteraceae family groups Gram-negative bacterial species previously included in the family Campylobacteraceae. These species of which some are considered foodborne pathogens, have been isolated from different environmental niches and hosts. They have been isolated from various types of foods, though predominantly from food of animal origin, as well as from stool of humans with enteritis. Their different abilities to survive in different hosts and environments suggest an evolutionary pressure with consequent variation in their genome content. Moreover, their different physiological and genomic characteristics led to the recent proposal to create new genera within this family, which is however criticized due to the lack of discriminatory features and biological and clinical relevance. Aims of the present study were to assess the Arcobacteraceae pangenome, and to characterize existing similarities and differences in 20 validly described species. For this, analysis has been conducted on the genomes of the corresponding type strains obtained by Illumina sequencing, applying several bioinformatic tools. Results of the present study do not support the proposed division into different genera and revealed the presence of pangenome partitions with numbers comparable to other Gram-negative bacteria genera, such as Campylobacter. Different gene class compositions in animal and human-associated species are present, including a higher percentage of virulence-related gene classes such as cell motility genes. The adaptation to environmental and/or host conditions of some species was identified by the presence of specific genes. Furthermore, a division into pathogenic and non-pathogenic species is suggested, which can support future research on food safety and public health.
Collapse
Affiliation(s)
- Davide Buzzanca
- Department of Veterinary and Biosciences, Faculty of Veterinary Medicine, Ghent University, Heidestraat 19, Merelbeke, Belgium
- Department of Agricultural, Forest and Food Sciences (DISAFA), University of Turin, Largo Paolo Braccini 2, 10095 Grugliasco (TO), Italy
| | - Pieter-Jan Kerkhof
- Department of Veterinary and Biosciences, Faculty of Veterinary Medicine, Ghent University, Heidestraat 19, Merelbeke, Belgium
| | - Valentina Alessandria
- Department of Agricultural, Forest and Food Sciences (DISAFA), University of Turin, Largo Paolo Braccini 2, 10095 Grugliasco (TO), Italy
| | - Kalliopi Rantsiou
- Department of Agricultural, Forest and Food Sciences (DISAFA), University of Turin, Largo Paolo Braccini 2, 10095 Grugliasco (TO), Italy
| | - Kurt Houf
- Department of Veterinary and Biosciences, Faculty of Veterinary Medicine, Ghent University, Heidestraat 19, Merelbeke, Belgium
- Laboratory of Microbiology, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Karel Lodewijk Ledeganckstraat 35, 9000 Ghent, Belgium
| |
Collapse
|
5
|
Du Y, Fuhrman JA, Sun F. ViralCC retrieves complete viral genomes and virus-host pairs from metagenomic Hi-C data. Nat Commun 2023; 14:502. [PMID: 36720887 PMCID: PMC9889337 DOI: 10.1038/s41467-023-35945-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 01/09/2023] [Indexed: 02/01/2023] Open
Abstract
The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available at https://github.com/dyxstat/ViralCC .
Collapse
Affiliation(s)
- Yuxuan Du
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
6
|
Tithi SS, Aylward FO, Jensen RV, Zhang L. FastViromeExplorer-Novel: Recovering Draft Genomes of Novel Viruses and Phages in Metagenomic Data. JOURNAL OF COMPUTATIONAL BIOLOGY : A JOURNAL OF COMPUTATIONAL MOLECULAR CELL BIOLOGY 2023; 30:391-408. [PMID: 36607772 DOI: 10.1089/cmb.2022.0397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Despite the recent surge of viral metagenomic studies, recovering complete virus/phage genomes from metagenomic data is still extremely difficult and most viral contigs generated from de novo assembly programs are highly fragmented, posing serious challenges to downstream analysis and inference. In this study, we develop FastViromeExplorer (FVE)-novel, a computational pipeline for reconstructing complete or near-complete viral draft genomes from metagenomic data. The FVE-novel deploys FVE to efficiently map metagenomic reads to viral reference genomes, performs de novo assembly of the mapped reads to generate contigs, and extends the contigs through iterative assembly to produce final viral scaffolds. We applied FVE-novel to an ocean metagenomic sample and obtained 268 viral scaffolds that potentially come from novel viruses. Through manual examination and validation of the 10 longest scaffolds, we successfully recovered 4 complete viral genomes, 2 are novel as they cannot be found in the existing databases and the other 2 are related to known phages. This hybrid reference-based and de novo assembly approach used by FVE-novel represents a powerful new approach for uncovering near-complete viral genomes in metagenomic data.
Collapse
Affiliation(s)
| | - Frank O Aylward
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA
| | - Roderick V Jensen
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA
| |
Collapse
|
7
|
Gupta AK, Kumar M. Benchmarking and Assessment of Eight De Novo Genome Assemblers on Viral Next-Generation Sequencing Data, Including the SARS-CoV-2. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:372-381. [PMID: 35759429 DOI: 10.1089/omi.2022.0042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Viral genomics has become crucial in clinical diagnostics and ecology, not to mention to stem the COVID-19 pandemic. Whole-genome sequencing (WGS) is pivotal in gaining an improved understanding of viral evolution, genomic epidemiology, infectious outbreaks, pathobiology, clinical management, and vaccine development. Genome assembly is one of the crucial steps in WGS data analyses. A series of different assemblers has been developed with the advent of high-throughput next-generation sequencing (NGS). Various studies have reported the evaluation of these assembly tools on distinct datasets; however, these lack data from viral origin. In this study, we performed a comparative evaluation and benchmarking of eight de novo assemblers: SOAPdenovo, Velvet, assembly by short sequences (ABySS), iterative De Bruijn graph assembler (IDBA), SPAdes, Edena, iterative virus assembler, and VICUNA on the viral NGS data from distinct Illumina (GAIIx, Hiseq, Miseq, and Nextseq) platforms. WGS data of diverse viruses, that is, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), dengue virus 3, human immunodeficiency virus 1, hepatitis B virus, human herpesvirus 8, human papillomavirus 16, rhinovirus A, and West Nile virus, were utilized to assess these assemblers. Performance metrics such as genome fraction recovery, assembly lengths, NG50, N50, contig length, contig numbers, mismatches, and misassemblies were analyzed. Overall, three assemblers, that is, SPAdes, IDBA, and ABySS, performed consistently well, including for genome assembly of SARS-CoV-2. These assembly methods should be considered and recommended for future studies of viruses. The study also suggests that implementing two or more assembly approaches should be considered in viral NGS studies, especially in clinical settings. Taken together, the benchmarking of eight de novo genome assemblers reported in this study can inform future public health and ecology research concerning the viruses, the COVID-19 pandemic, and viral outbreaks.
Collapse
Affiliation(s)
- Amit Kumar Gupta
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
8
|
Mostafa-Hedeab G, Allayeh AK, Elhady HA, Eledrdery AY, Mraheil MA, Mostafa A. Viral Eco-Genomic Tools: Development and Implementation for Aquatic Biomonitoring. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:7707. [PMID: 35805367 PMCID: PMC9265447 DOI: 10.3390/ijerph19137707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 12/17/2022]
Abstract
Enteric viruses (EVs) occurrence within aquatic environments varies and leads to significant risk on public health of humans, animals, and diversity of aquatic taxa. Early and efficacious recognition of cultivable and fastidious EVs in aquatic systems are important to ensure the sanitary level of aquatic water and implement required treatment strategies. Herein, we provided a comprehensive overview of the conventional and up-to-date eco-genomic tools for aquatic biomonitoring of EVs, aiming to develop better water pollution monitoring tools. In combination with bioinformatics techniques, genetic tools including cloning sequencing analysis, DNA microarray, next-generation sequencing (NGS), and metagenomic sequencing technologies are implemented to make informed decisions about the global burden of waterborne EVs-associated diseases. The data presented in this review are helpful to recommend that: (1) Each viral pollution detection method has its own merits and demerits; therefore, it would be advantageous for viral pollution evaluation to be integrated as a complementary platform. (2) The total viral genome pool extracted from aquatic environmental samples is a real reflection of pollution status of the aquatic eco-systems; therefore, it is recommended to conduct regular sampling through the year to establish an updated monitoring system for EVs, and quantify viral peak concentrations, viral typing, and genotyping. (3) Despite that conventional detection methods are cheaper, it is highly recommended to implement molecular-based technologies to complement aquatic ecosystems biomonitoring due to numerous advantages including high-throughput capability. (4) Continuous implementation of the eco-genetic detection tools for monitoring the EVs in aquatic ecosystems is recommended.
Collapse
Affiliation(s)
- Gomaa Mostafa-Hedeab
- Pharmacology Department and Health Research Unit, Medical College, Jouf University, Skaka 11564, Saudi Arabia
| | - Abdou Kamal Allayeh
- Water Pollution Department, Virology Laboratory, National Research Centre, Dokki, Giza 12622, Egypt;
| | | | - Abozer Y. Eledrdery
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Jouf University, Sakaka 11564, Saudi Arabia;
| | - Mobarak Abu Mraheil
- German Center for Infection Research (DZIF), Institute of Medical Microbiology, Justus-Liebig University, 35392 Giessen, Germany
| | - Ahmed Mostafa
- Center of Scientific Excellence for Influenza Viruses, National Research Centre, Giza 12622, Egypt
| |
Collapse
|
9
|
Song S, Ma L, Xu X, Shi H, Li X, Liu Y, Hao P. Rapid screening and identification of viral pathogens in metagenomic data. BMC Med Genomics 2021; 14:289. [PMID: 34903237 PMCID: PMC8668262 DOI: 10.1186/s12920-021-01138-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 11/16/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Virus screening and viral genome reconstruction are urgent and crucial for the rapid identification of viral pathogens, i.e., tracing the source and understanding the pathogenesis when a viral outbreak occurs. Next-generation sequencing (NGS) provides an efficient and unbiased way to identify viral pathogens in host-associated and environmental samples without prior knowledge. Despite the availability of software, data analysis still requires human operations. A mature pipeline is urgently needed when thousands of viral pathogen and viral genome reconstruction samples need to be rapidly identified. RESULTS In this paper, we present a rapid and accurate workflow to screen metagenomics sequencing data for viral pathogens and other compositions, as well as enable a reference-based assembler to reconstruct viral genomes. Moreover, we tested our workflow on several metagenomics datasets, including a SARS-CoV-2 patient sample with NGS data, pangolins tissues with NGS data, Middle East Respiratory Syndrome (MERS)-infected cells with NGS data, etc. Our workflow demonstrated high accuracy and efficiency when identifying target viruses from large scale NGS metagenomics data. Our workflow was flexible when working with a broad range of NGS datasets from small (kb) to large (100 Gb). This took from a few minutes to a few hours to complete each task. At the same time, our workflow automatically generates reports that incorporate visualized feedback (e.g., metagenomics data quality statistics, host and viral sequence compositions, details about each of the identified viral pathogens and their coverages, and reassembled viral pathogen sequences based on their closest references). CONCLUSIONS Overall, our system enabled the rapid screening and identification of viral pathogens from metagenomics data, providing an important piece to support viral pathogen research during a pandemic. The visualized report contains information from raw sequence quality to a reconstructed viral sequence, which allows non-professional people to screen their samples for viruses by themselves (Additional file 1).
Collapse
Affiliation(s)
- Shiyang Song
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Liangxiao Ma
- Bio-Med Big Data Center, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, 20031, China
| | - Xintian Xu
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Han Shi
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Xuan Li
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Yuanhua Liu
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Pei Hao
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
10
|
Abstract
Mediators of the initiation, development, and recurrence of periodontitis include the oral microbiome embedded in subgingival plaque and the host immune response to a dysbiosis within this dynamic and complex microbial community. Although mediators have been studied extensively, researchers in the field have been unable to fully ascribe certain clinical presentations of periodontitis to their nature. Emergence of high-throughput sequencing technologies has resulted in better characterization of the microbial oral dysbiosis that extends beyond the extensively studied putative bacterial periodontopathogens to a shift in the oral virome composition during disease conditions. Although the biological dark matter inserted by retroviruses was once believed to be nonfunctional, research has revealed that it encodes historical viral-eukaryotic interactions and influences host development. The objective of this review is to evaluate the proposed association of herpesviruses to the etiology and pathogenesis of periodontal disease and survey the highly abundant prokaryotic viruses to delineate their potential roles in biofilm dynamics, as well as their interactions with putative bacterial periodontopathogens and eukaryotic cells. The findings suggest that potential novel periodontal therapies targeting or utilizing the oral virome can alleviate certain clinical presentations of periodontitis. Perhaps it is time to embrace the viral dark matter within the periodontal environment to fully comprehend the pathogenesis and systemic implications of periodontitis.
Collapse
Affiliation(s)
- April Martínez
- Orofacial Sciences DepartmentSchool of DentistryUniversity of California, San FranciscoSan FranciscoCaliforniaUSA
| | - Ryutaro Kuraji
- Orofacial Sciences DepartmentSchool of DentistryUniversity of California, San FranciscoSan FranciscoCaliforniaUSA
- Department of Life Science DentistryThe Nippon Dental UniversityTokyoJapan
- Department of PeriodontologyThe Nippon Dental University School of Life Dentistry at TokyoTokyoJapan
| | - Yvonne L. Kapila
- Orofacial Sciences DepartmentSchool of DentistryUniversity of California, San FranciscoSan FranciscoCaliforniaUSA
| |
Collapse
|
11
|
Arisdakessian CG, Nigro OD, Steward GF, Poisson G, Belcaid M. CoCoNet: an efficient deep learning tool for viral metagenome binning. Bioinformatics 2021; 37:2803-2810. [PMID: 33822891 DOI: 10.1093/bioinformatics/btab213] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 03/24/2021] [Accepted: 04/02/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Metagenomic approaches hold the potential to characterize microbial communities and unravel the intricate link between the microbiome and biological processes. Assembly is one of the most critical steps in metagenomics experiments. It consists of transforming overlapping DNA sequencing reads into sufficiently accurate representations of the community's genomes. This process is computationally difficult and commonly results in genomes fragmented across many contigs. Computational binning methods are used to mitigate fragmentation by partitioning contigs based on their sequence composition, abundance or chromosome organization into bins representing the community's genomes. Existing binning methods have been principally tuned for bacterial genomes and do not perform favorably on viral metagenomes. RESULTS We propose Composition and Coverage Network (CoCoNet), a new binning method for viral metagenomes that leverages the flexibility and the effectiveness of deep learning to model the co-occurrence of contigs belonging to the same viral genome and provide a rigorous framework for binning viral contigs. Our results show that CoCoNet substantially outperforms existing binning methods on viral datasets. AVAILABILITY AND IMPLEMENTATION CoCoNet was implemented in Python and is available for download on PyPi (https://pypi.org/). The source code is hosted on GitHub at https://github.com/Puumanamana/CoCoNet and the documentation is available at https://coconet.readthedocs.io/en/latest/index.html. CoCoNet does not require extensive resources to run. For example, binning 100k contigs took about 4 h on 10 Intel CPU Cores (2.4 GHz), with a memory peak at 27 GB (see Supplementary Fig. S9). To process a large dataset, CoCoNet may need to be run on a high RAM capacity server. Such servers are typically available in high-performance or cloud computing settings. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cédric G Arisdakessian
- Department of Information and Computer Sciences, University of Hawai'i at Mānoa, Honolulu, HI 96822, USA
| | - Olivia D Nigro
- Department of Natural Science, Hawai'i Pacific University, Honolulu, HI 96813, USA
| | - Grieg F Steward
- Department of Oceanography, University of Hawai'i at Mānoa, Honolulu, HI 96822, USA
| | - Guylaine Poisson
- Department of Information and Computer Sciences, University of Hawai'i at Mānoa, Honolulu, HI 96822, USA
| | - Mahdi Belcaid
- Department of Information and Computer Sciences, University of Hawai'i at Mānoa, Honolulu, HI 96822, USA.,Hawai'i Institute of Marine Biology, University of Hawai'i at Mānoa, Honolulu, HI 96816, USA
| |
Collapse
|
12
|
Kayani MUR, Huang W, Feng R, Chen L. Genome-resolved metagenomics using environmental and clinical samples. Brief Bioinform 2021; 22:bbab030. [PMID: 33758906 PMCID: PMC8425419 DOI: 10.1093/bib/bbab030] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/29/2020] [Accepted: 01/20/2021] [Indexed: 12/25/2022] Open
Abstract
Recent advances in high-throughput sequencing technologies and computational methods have added a new dimension to metagenomic data analysis i.e. genome-resolved metagenomics. In general terms, it refers to the recovery of draft or high-quality microbial genomes and their taxonomic classification and functional annotation. In recent years, several studies have utilized the genome-resolved metagenome analysis approach and identified previously unknown microbial species from human and environmental metagenomes. In this review, we describe genome-resolved metagenome analysis as a series of four necessary steps: (i) preprocessing of the sequencing reads, (ii) de novo metagenome assembly, (iii) genome binning and (iv) taxonomic and functional analysis of the recovered genomes. For each of these four steps, we discuss the most commonly used tools and the currently available pipelines to guide the scientific community in the recovery and subsequent analyses of genomes from any metagenome sample. Furthermore, we also discuss the tools required for validation of assembly quality as well as for improving quality of the recovered genomes. We also highlight the currently available pipelines that can be used to automate the whole analysis without having advanced bioinformatics knowledge. Finally, we will highlight the most widely adapted and actively maintained tools and pipelines that can be helpful to the scientific community in decision making before they commence the analysis.
Collapse
Affiliation(s)
- Masood ur Rehman Kayani
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Wanqiu Huang
- Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 200,000, China
| | - Ru Feng
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Lei Chen
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| |
Collapse
|
13
|
Ayling M, Clark MD, Leggett RM. New approaches for metagenome assembly with short reads. Brief Bioinform 2021; 21:584-594. [PMID: 30815668 PMCID: PMC7299287 DOI: 10.1093/bib/bbz020] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 01/31/2019] [Accepted: 02/01/2019] [Indexed: 02/07/2023] Open
Abstract
In recent years, the use of longer range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic data sets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.
Collapse
Affiliation(s)
- Martin Ayling
- Earlham Institute, Norwich Research Park, Norwich, UK
| | | | | |
Collapse
|
14
|
Kim HR, Jang I, Kim SH, Kwon YK. Viral Metagenomic Analysis of Japanese Quail ( Coturnix japonica) with Enteritis in the Republic of Korea. Avian Dis 2021; 65:40-45. [PMID: 34339120 DOI: 10.1637/aviandiseases-d-20-00081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Accepted: 08/27/2020] [Indexed: 11/05/2022]
Abstract
We performed viral metagenomics analysis of Japanese quail affected with enteritis to elucidate the viral etiology. Metagenomics generated 21,066,442 sequence reads via high-throughput sequencing, with a mean length of 136 nt. Enrichment in viral sequences suggested that at least three viruses were present in quail samples. Coronavirus and picornavirus were identified and are known as pathogens causing quail enteritis that match the observed morphology. Abundant reads of coronavirus from quail samples yielded four fragment sequences exhibiting six genomes of avian coronavirus. Sequence analysis showed that this quail coronavirus was related to turkey coronavirus and chicken infectious bronchitis virus. Quail picornavirus 8177 bp in size was identified and was similar to the QPV1/HUN/01 virus detected in quails without clinical symptoms in Hungary with 84.6% nucleotide and 94.6% amino acid identity. Our results are useful for understanding the genetic diversity of quail viruses. Further studies must be performed to determine whether quail coronavirus and quail picornavirus are pathogens of the digestive tract of quails.
Collapse
Affiliation(s)
- Hye-Ryoung Kim
- Avian Disease Division, Animal and Plant Quarantine Agency, Gimcheon-si, Gyeongsangbuk-do, 39660, Republic of Korea,
| | - Il Jang
- Avian Disease Division, Animal and Plant Quarantine Agency, Gimcheon-si, Gyeongsangbuk-do, 39660, Republic of Korea
| | - Si-Hyeon Kim
- Avian Disease Division, Animal and Plant Quarantine Agency, Gimcheon-si, Gyeongsangbuk-do, 39660, Republic of Korea
| | - Yong-Kuk Kwon
- Avian Disease Division, Animal and Plant Quarantine Agency, Gimcheon-si, Gyeongsangbuk-do, 39660, Republic of Korea
| |
Collapse
|
15
|
Townsend EM, Kelly L, Muscatt G, Box JD, Hargraves N, Lilley D, Jameson E. The Human Gut Phageome: Origins and Roles in the Human Gut Microbiome. Front Cell Infect Microbiol 2021; 11:643214. [PMID: 34150671 PMCID: PMC8213399 DOI: 10.3389/fcimb.2021.643214] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 05/19/2021] [Indexed: 12/14/2022] Open
Abstract
The investigation of the microbial populations of the human body, known as the microbiome, has led to a revolutionary field of science, and understanding of its impacts on human development and health. The majority of microbiome research to date has focussed on bacteria and other kingdoms of life, such as fungi. Trailing behind these is the interrogation of the gut viruses, specifically the phageome. Bacteriophages, viruses that infect bacterial hosts, are known to dictate the dynamics and diversity of bacterial populations in a number of ecosystems. However, the phageome of the human gut, while of apparent importance, remains an area of many unknowns. In this paper we discuss the role of bacteriophages within the human gut microbiome. We examine the methods used to study bacteriophage populations, how this evolved over time and what we now understand about the phageome. We review the phageome development in infancy, and factors that may influence phage populations in adult life. The role and action of the phageome is then discussed at both a biological-level, and in the broader context of human health and disease.
Collapse
Affiliation(s)
- Eleanor M Townsend
- School of Life Sciences, The University of Warwick, Coventry, United Kingdom
| | - Lucy Kelly
- School of Life Sciences, The University of Warwick, Coventry, United Kingdom
| | - George Muscatt
- School of Life Sciences, The University of Warwick, Coventry, United Kingdom
| | - Joshua D Box
- School of Life Sciences, The University of Warwick, Coventry, United Kingdom
| | - Nicole Hargraves
- School of Life Sciences, The University of Warwick, Coventry, United Kingdom
| | - Daniel Lilley
- Warwick Medical School, The University of Warwick, Coventry, United Kingdom
| | - Eleanor Jameson
- School of Life Sciences, The University of Warwick, Coventry, United Kingdom
| |
Collapse
|
16
|
Abstract
Colorectal cancer (CRC) is a leading cause of cancer-related deaths in both the USA and the world. Recent research has demonstrated the involvement of the gut microbiota in CRC development and progression. Microbial biomarkers of disease have focused primarily on the bacterial component of the microbiome; however, the viral portion of the microbiome, consisting of both bacteriophages and eukaryotic viruses, together known as the virome, has been lesser studied. Here we review the recent advancements in high-throughput sequencing (HTS) technologies and bioinformatics, which have enabled scientists to better understand how viruses might influence the development of colorectal cancer. We discuss the contemporary findings revealing modulations in the virome and their correlation with CRC development and progression. While a variety of challenges still face viral HTS detection in clinical specimens, we consider herein numerous next steps for future basic and clinical research. Clinicians need to move away from a single infectious agent model for disease etiology by grasping new, more encompassing etiological paradigms, in which communities of various microbial components interact with each other and the host. The reporting and indexing of patient health information, socioeconomic data, and other relevant metadata will enable identification of predictive variables and covariates of viral presence and CRC development. Altogether, the virome has a more profound role in carcinogenesis and cancer progression than once thought, and viruses, specific for either human cells or bacteria, are clinically relevant in understanding CRC pathology, patient prognosis, and treatment development.
Collapse
|
17
|
García-López R, Pérez-Brocal V, Moya A. Beyond cells - The virome in the human holobiont. MICROBIAL CELL (GRAZ, AUSTRIA) 2019; 6:373-396. [PMID: 31528630 PMCID: PMC6717880 DOI: 10.15698/mic2019.09.689] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Revised: 03/14/2019] [Accepted: 04/03/2019] [Indexed: 01/01/2023]
Abstract
Viromics, or viral metagenomics, is a relatively new and burgeoning field of research that studies the complete collection of viruses forming part of the microbiota in any given niche. It has strong foundations rooted in over a century of discoveries in the field of virology and recent advances in molecular biology and sequencing technologies. Historically, most studies have deconstructed the concept of viruses into a simplified perception of viral agents as mere pathogens, which demerits the scope of large-scale viromic analyses. Viruses are, in fact, much more than regular parasites. They are by far the most dynamic and abundant entity and the greatest killers on the planet, as well as the most effective geo-transforming genetic engineers and resource recyclers, acting on all life strata in any habitat. Yet, most of this uncanny viral world remains vastly unexplored to date, greatly hindered by the bewildering complexity inherent to such studies and the methodological and conceptual limitations. Viromic studies are just starting to address some of these issues but they still lag behind microbial metagenomics. In recent years, however, higher-throughput analysis and resequencing have rekindled interest in a field that is just starting to show its true potential. In this review, we take a look at the scientific and technological developments that led to the advent of viral and bacterial metagenomics with a particular, but not exclusive, focus on human viromics from an ecological perspective. We also address some of the most relevant challenges that current viral studies face and ponder on the future directions of the field.
Collapse
Affiliation(s)
- Rodrigo García-López
- Institute of Evolutionary Systems Biology (I2Sysbio), Universitat de València and CSIC, València, Spain
- CIBER in Epidemiology and Public Health (CIBEResp), Madrid, Spain
- Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valenciana (FISABIO), València, Spain
| | - Vicente Pérez-Brocal
- Institute of Evolutionary Systems Biology (I2Sysbio), Universitat de València and CSIC, València, Spain
- CIBER in Epidemiology and Public Health (CIBEResp), Madrid, Spain
- Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valenciana (FISABIO), València, Spain
| | - Andrés Moya
- Institute of Evolutionary Systems Biology (I2Sysbio), Universitat de València and CSIC, València, Spain
- CIBER in Epidemiology and Public Health (CIBEResp), Madrid, Spain
- Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valenciana (FISABIO), València, Spain
| |
Collapse
|
18
|
Nash MV, Anesio AM, Barker G, Tranter M, Varliero G, Eloe-Fadrosh EA, Nielsen T, Turpin-Jelfs T, Benning LG, Sánchez-Baracaldo P. Metagenomic insights into diazotrophic communities across Arctic glacier forefields. FEMS Microbiol Ecol 2019; 94:5036517. [PMID: 29901729 PMCID: PMC6054269 DOI: 10.1093/femsec/fiy114] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 06/11/2018] [Indexed: 11/30/2022] Open
Abstract
Microbial nitrogen fixation is crucial for building labile nitrogen stocks and facilitating higher plant colonisation in oligotrophic glacier forefield soils. Here, the diazotrophic bacterial community structure across four Arctic glacier forefields was investigated using metagenomic analysis. In total, 70 soil metagenomes were used for taxonomic interpretation based on 185 nitrogenase (nif) sequences, extracted from assembled contigs. The low number of recovered genes highlights the need for deeper sequencing in some diverse samples, to uncover the complete microbial populations. A key group of forefield diazotrophs, found throughout the forefields, was identified using a nifH phylogeny, associated with nifH Cluster I and III. Sequences related most closely to groups including Alphaproteobacteria, Betaproteobacteria, Cyanobacteria and Firmicutes. Using multiple nif genes in a Last Common Ancestor analysis revealed a diverse range of diazotrophs across the forefields. Key organisms identified across the forefields included Nostoc, Geobacter, Polaromonas and Frankia. Nitrogen fixers that are symbiotic with plants were also identified, through the presence of root associated diazotrophs, which fix nitrogen in return for reduced carbon. Additional nitrogen fixers identified in forefield soils were metabolically diverse, including fermentative and sulphur cycling bacteria, halophiles and anaerobes.
Collapse
Affiliation(s)
- Maisie V Nash
- School of Geographical Sciences, University of Bristol, UK
| | | | - Gary Barker
- School of Life Sciences, University of Bristol, UK
| | - Martyn Tranter
- School of Geographical Sciences, University of Bristol, UK
| | | | | | - Torben Nielsen
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, US
| | | | - Liane G Benning
- GFZ German Research Centre for Geosciences, Telegrafenenberg, 14473 Potsdam, Germany.,School of Earth and Environment, University of Leeds, LS2 9JT, Leeds, UK.,Department of Earth Sciences, Free University of Berlin, Malteserstr, 74-100, Building A, 12249, Berlin, Germany
| | | |
Collapse
|
19
|
Sutton TDS, Clooney AG, Ryan FJ, Ross RP, Hill C. Choice of assembly software has a critical impact on virome characterisation. MICROBIOME 2019; 7:12. [PMID: 30691529 PMCID: PMC6350398 DOI: 10.1186/s40168-019-0626-5] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 01/14/2019] [Indexed: 05/19/2023]
Abstract
BACKGROUND The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. DESIGN This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. RESULTS Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data.
Collapse
Affiliation(s)
- Thomas D S Sutton
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
| | - Adam G Clooney
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
| | - Feargal J Ryan
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
- Present Address: South Australian Health and Medical Research Institute, Adelaide, Australia
| | - R Paul Ross
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
- Teagasc Food Research Centre, Fermoy, Cork, Ireland
| | - Colin Hill
- APC Microbiome Ireland, Cork, Ireland.
- School for Microbiology, University College Cork, Cork, Ireland.
| |
Collapse
|
20
|
Fedonin GG, Fantin YS, Favorov AV, Shipulin GA, Neverov AD. VirGenA: a reference-based assembler for variable viral genomes. Brief Bioinform 2019; 20:15-25. [PMID: 28968771 PMCID: PMC6488938 DOI: 10.1093/bib/bbx079] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Characterization of the within-host genetic diversity of viral pathogens is required for selection of effective treatment of some important viral infections, e.g. HIV, HBV and HCV. Despite the technical ability of detection, there are conflicting data regarding the clinical significance of low-frequency variants, partially because of the difficulty of their distinguishing from experimental artifacts. The issue of cross-contamination is relevant for all highly sensitive techniques, including deep sequencing: even trace contamination leads to a significant increase of false positives in identified SNVs. Determination of infections by multiple genotypes of some viruses, the incidence of which can be considerable, especially in risk groups, is also clinically significant in some cases. We developed a new viral reference-guided assembler, VirGenA, that can separate mixtures of strains of different intraspecies genetic groups (genotypes, subtypes, clades, etc.) and assemble a separate consensus sequence for each group in a mixture. It produced long assemblies for mixture components of extremely low frequencies (<1%) allowing detection of cross-contamination of samples by divergent genotypes. We tested VirGenA on both clinical and simulated data. On both types of data, VirGenA shows better or similar results than the existing de novo assemblers. Cross-platform implementation (including source code) is freely available at https://github.com/gFedonin/VirGenA/releases.
Collapse
Affiliation(s)
- Gennady G Fedonin
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology
| | - Yury S Fantin
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology
| | - Alexnader V Favorov
- Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University
| | - German A Shipulin
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology
| | - Alexey D Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology
| |
Collapse
|
21
|
Hamza IA, Bibby K. Critical issues in application of molecular methods to environmental virology. J Virol Methods 2019; 266:11-24. [PMID: 30659861 DOI: 10.1016/j.jviromet.2019.01.008] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2018] [Revised: 01/15/2019] [Accepted: 01/16/2019] [Indexed: 12/16/2022]
Abstract
Waterborne diseases have significant public health and socioeconomic implications worldwide. Many viral pathogens are commonly associated with water-related diseases, namely enteric viruses. Also, novel recently discovered human-associated viruses have been shown to be a causative agent of gastroenteritis or other clinical symptoms. A wide range of analytical methods is available for virus detection in environmental water samples. Viral isolation is historically carried out via propagation on permissive cell lines; however, some enteric viruses are difficult or not able to propagate on existing cell lines. Real-time polymerase chain reaction (qPCR) screening of viral nucleic acid is routinely used to investigate virus contamination in water due to the high sensitivity and specificity. Additionally, the introduction of metagenomic approaches into environmental virology has facilitated the discovery of viruses that cannot be grown in cell culture. This review (i) highlights the applications of molecular techniques in environmental virology such as PCR and its modifications to overcome the critical issues associated with the inability to discriminate between infectious viruses and nonviable viruses, (ii) outlines the strengths and weaknesses of Nucleic Acid Sequence Based Amplification (NASBA) and microarray, (iii) discusses the role of digital PCR as an emerging water quality monitoring assay and its advantages over qPCR, (iv) addresses the viral metagenomics in terms of detecting emerging viral pathogens and diversity in aquatic environment. Indeed, there are many challenges for selecting methods to detect classic and emerging viruses in environmental samples. While the existing techniques have revealed the importance and diversity of viruses in the water environment, further developments are necessary to enable more rapid and accurate methodologies for viral water quality monitoring and regulation.
Collapse
Affiliation(s)
- Ibrahim Ahmed Hamza
- Department of Water Pollution Research, National Research Centre, Cairo, Egypt.
| | - Kyle Bibby
- Department of Civil & Environmental Engineering & Earth Sciences, University of Notre Dame, USA
| |
Collapse
|
22
|
Astudillo-de la Vega H, Alonso-Luna O, Ali-Pérez J, López-Camarillo C, Ruiz-Garcia E. Oncobiome at the Forefront of a Novel Molecular Mechanism to Understand the Microbiome and Cancer. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1168:147-156. [PMID: 31713170 DOI: 10.1007/978-3-030-24100-1_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The microbiome comprises all the genetic material within a microbiota, that represents tenfold higher than that of our cells. The microbiota it includes a wide variety of microorganisms such as bacteria, viruses, protozoans, fungi, and archaea, and this ecosystem is personalized in any body space of every individual. Balanced microbial communities can positively contribute to training the immune system and maintaining immune homeostasis. Dysbiosis is a change in the normal microbiome composition that can initiate chronic inflammation, epithelial barrier breaches, and overgrowth of harmful bacteria. The next-generation sequencing methods have revolutionized the study of the microbiome. Bioinformatic tools to manage large volumes of new information, it became possible to assess species diversity and measure dynamic fluctuations in microbial communities. The burden of infections that are associated to human cancer is increasing but is underappreciated by the cancer research community. The rich content in microbes of normal and tumoral tissue reflect could be defining diverse physiological or pathological states. Genomic research has emerged a new focus on the interplay between the human microbiome and carcinogenesis and has been termed the 'oncobiome'. The interactions among the microbiota in all epithelium, induce changes in the host immune interactions and can be a cause of cancer. Microbes have been shown to have systemic effects on the host that influence the efficacy of anticancer drugs. Metagenomics allows to investigate the composition of microbial community. Metatranscriptome analysis applies RNA sequencing to microbial samples to determine which species are present. Cancer can be caused by changes in the microbiome. The roles of individual microbial species in cancer progression have been identified long ago for various tissue types. The identification of microbiomes of drug resistance in the treatment of cancer patients has been the subject of numerous microbiome studies. The complexity of cancer genetic alterations becomes irrelevant in certain cancers to explain the origin, the cause or the oncogenic maintenance by the oncogene addiction theory.
Collapse
Affiliation(s)
- H Astudillo-de la Vega
- Translational Research Laboratory in Cancer & Celullar Therapy, Hospital de Oncologia, Siglo XXI, IMSS, Mexico City, Mexico.
| | - O Alonso-Luna
- Laboratorio de NGS, Nanopharmacia Diagnostica de la Ciudad de Mexico, Mexico City, Mexico
| | - J Ali-Pérez
- Laboratorio de Oncogenomica, Nanopharmacia Diagnostica de la Ciudad de Mexico, Mexico City, Mexico
| | - C López-Camarillo
- Posgrado en Ciencias Genomicas, Universidad Autonoma de la Ciudad de Mexico, Mexico City, Mexico
| | - E Ruiz-Garcia
- Department of Gastrointestinal Medical Oncology & Translational Medicine Laboratory, Instituto Nacional de Cancerologia, Mexico City, Mexico
| |
Collapse
|
23
|
Abstract
Viruses are the most abundant and diverse biological entity in the earth. Nowadays, there are several viral metagenomes from different ecological niches which have been used to characterize new viral particles and to determine their diversity. However, viral metagenomic data have the disadvantage to be high-dimensional compositional and sparse. This type of data renders many of the conventional multivariate statistical analyses inoperative. Fortunately, different libraries and statistical packages have been developed to deal with this problem and perform the different ecological and statistical analyses. In the present chapter, it is analyzed simulated viral metagenomes, based on real human gut-associated viral metagenomes, using different R and python packages. The example presented here includes the estimation and comparison of different indexes of diversity, evenness, and richness; perform different ordination and statistical analysis using different dissimilarity metrics; determine the optimal cluster configuration and perform biomarker discovery. The scripts and the simulated datasets are in https://github.com/jorgevazcast/Viromic-diversity.
Collapse
|
24
|
Chen Q, Lan C, Zhao L, Wang J, Chen B, Chen YPP. Recent advances in sequence assembly: principles and applications. Brief Funct Genomics 2018; 16:361-378. [PMID: 28453648 DOI: 10.1093/bfgp/elx006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The application of advanced sequencing technologies and the rapid growth of various sequence data have led to increasing interest in DNA sequence assembly. However, repeats and polymorphism occur frequently in genomes, and each of these has different impacts on assembly. Further, many new applications for sequencing, such as metagenomics regarding multiple species, have emerged in recent years. These not only give rise to higher complexity but also prevent short-read assembly in an efficient way. This article reviews the theoretical foundations that underlie current mapping-based assembly and de novo-based assembly, and highlights the key issues and feasible solutions that need to be considered. It focuses on how individual processes, such as optimal k-mer determination and error correction in assembly, rely on intelligent strategies or high-performance computation. We also survey primary algorithms/software and offer a discussion on the emerging challenges in assembly.
Collapse
|
25
|
Bengtsson-Palme J, Larsson DGJ, Kristiansson E. Using metagenomics to investigate human and environmental resistomes. J Antimicrob Chemother 2018; 72:2690-2703. [PMID: 28673041 DOI: 10.1093/jac/dkx199] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Antibiotic resistance is a global health concern declared by the WHO as one of the largest threats to modern healthcare. In recent years, metagenomic DNA sequencing has started to be applied as a tool to study antibiotic resistance in different environments, including the human microbiota. However, a multitude of methods exist for metagenomic data analysis, and not all methods are suitable for the investigation of resistance genes, particularly if the desired outcome is an assessment of risks to human health. In this review, we outline the current state of methods for sequence handling, mapping to databases of resistance genes, statistical analysis and metagenomic assembly. In addition, we provide an overview of important considerations related to the analysis of resistance genes, and recommend some of the currently used tools and methods that are best equipped to inform research and clinical practice related to antibiotic resistance.
Collapse
Affiliation(s)
- Johan Bengtsson-Palme
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Guldhedsgatan 10, SE-41346, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden
| | - D G Joakim Larsson
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Guldhedsgatan 10, SE-41346, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden
| | - Erik Kristiansson
- Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden.,Department of Mathematical Sciences, Chalmers University of Technology, SE-41296, Gothenburg, Sweden
| |
Collapse
|
26
|
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MPG. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front Microbiol 2018; 9:749. [PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/03/2018] [Indexed: 12/20/2022] Open
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.
Collapse
Affiliation(s)
- Sam Nooij
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Dennis Schmitz
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Harry Vennema
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Annelies Kroneman
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Marion P G Koopmans
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| |
Collapse
|
27
|
Reducing inherent biases introduced during DNA viral metagenome analyses of municipal wastewater. PLoS One 2018; 13:e0195350. [PMID: 29614100 PMCID: PMC5882159 DOI: 10.1371/journal.pone.0195350] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 03/20/2018] [Indexed: 01/21/2023] Open
Abstract
Metagenomics is a powerful tool for characterizing viral composition within environmental samples, but sample and molecular processing steps can bias the estimation of viral community structure. The objective of this study is to understand the inherent variability introduced when conducting viral metagenomic analyses of wastewater and provide a bioinformatic strategy to accurately analyze sequences for viral community analyses. A standard approach using a combination of ultrafiltration, membrane filtration, and DNase treatment, and multiple displacement amplification (MDA) produced DNA preparations without any bacterial derived genes. Results showed recoveries in wastewater matrix ranged between 60–100%. A bias towards small single stranded DNA (ssDNA; polyomavirus) virus types vs larger double stranded DNA (dsDNA; adenovirus) viruses was also observed with a total estimated recovery of small circular viruses to be as much as 173-fold higher. Notably, ssDNA abundance decreased with sample dilution while large dsDNA genomes (e.g., Caudovirales) initially increased in abundance with dilution before gradually decreasing with further dilution in wastewater samples. The present study revealed the inherent biases associated with different components of viral metagenomic methods applied to wastewater. Overall, these results provide a well-characterized approach for effectively conducting viral metagenomics analysis of wastewater and reveal that dilution can effectively mitigate MDA bias.
Collapse
|
28
|
Du R, Fang Z. Statistical correction for functional metagenomic profiling of a microbial community with short NGS reads. J Appl Stat 2018; 45:2521-2535. [PMID: 30505061 DOI: 10.1080/02664763.2018.1426741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
By sequence homology search, the list of all the functions found and the counts of reads being aligned to them present the functional profile of a metagenomic sample. However, a significant obstacle has been observed in this approach due to the short read length associated with many next generation sequencing technologies. This includes artificial families, cross-annotations, length bias and conservation bias. The widely applied cutoff methods, such as BLAST E-value, are not able to solve the problems. Following the published successful procedures on the artificial families and the cross-annotation issue, we propose in this paper to use zero-truncated Poisson and Binomial (ZTP-Bin) hierarchical modelling to correct the length bias and the conservation bias. Goodness-of-fit of the modelling and cross-validation for the prediction using a bioinformatic simulated sample show the validity of this approach. Evaluated on an in vitro-simulated data set, the proposed modelling method outperforms other traditional methods. All three steps were then sequentially applied on real-life metagenomic samples to show that the proposed framework will lead to a more accurate functional profile of a short read metagenomic sample.
Collapse
Affiliation(s)
- Ruofei Du
- Biostatistics Shared Resource, University of New Mexico Comprehensive Cancer Center, Albuquerque, USA
| | - Zhide Fang
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, USA
| |
Collapse
|
29
|
Tithi SS, Aylward FO, Jensen RV, Zhang L. FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data. PeerJ 2018; 6:e4227. [PMID: 29340239 PMCID: PMC5768174 DOI: 10.7717/peerj.4227] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 12/12/2017] [Indexed: 11/20/2022] Open
Abstract
With the increase in the availability of metagenomic data generated by next generation sequencing, there is an urgent need for fast and accurate tools for identifying viruses in host-associated and environmental samples. In this paper, we developed a stand-alone pipeline called FastViromeExplorer for the detection and abundance quantification of viruses and phages in large metagenomic datasets by performing rapid searches of virus and phage sequence databases. Both simulated and real data from human microbiome and ocean environmental samples are used to validate FastViromeExplorer as a reliable tool to quickly and accurately identify viruses and their abundances in large datasets.
Collapse
Affiliation(s)
- Saima Sultana Tithi
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Frank O Aylward
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Roderick V Jensen
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| | - Liqing Zhang
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, United States of America
| |
Collapse
|
30
|
Bodewes R. Novel viruses in birds: Flying through the roof or is a cage needed? Vet J 2018; 233:55-62. [PMID: 29486880 DOI: 10.1016/j.tvjl.2017.12.023] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 09/28/2017] [Accepted: 12/28/2017] [Indexed: 01/17/2023]
Abstract
Emerging viral diseases continue to have a major global impact on human beings and animals. To be able to take adequate measures in case of an outbreak of an emerging disease, rapid detection of the causative agent is a crucial first step. In this review, various aspects of virus discovery are discussed, with a special focus on recently discovered viruses in birds. Novel viruses with a potential major impact have been discovered in domestic and wild bird species in recent years using various virus discovery methods. Only a few studies report the detection of novel viruses in endangered bird species, although increased knowledge about viruses circulating in these species is important. Additional studies focusing on the exact role of a novel virus in disease and on the impact of a novel virus on bird populations are often lacking. Intensive collaboration between different disciplines is needed to obtain useful information about the role of these novel viruses.
Collapse
Affiliation(s)
- R Bodewes
- Department of Farm Animal Health, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
31
|
Roux S, Emerson JB, Eloe-Fadrosh EA, Sullivan MB. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 2017; 5:e3817. [PMID: 28948103 PMCID: PMC5610896 DOI: 10.7717/peerj.3817] [Citation(s) in RCA: 185] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 08/26/2017] [Indexed: 12/20/2022] Open
Abstract
Background Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used in silico mock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. Results Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. Conclusions These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.
Collapse
Affiliation(s)
- Simon Roux
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Joanne B Emerson
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Emiley A Eloe-Fadrosh
- Joint Genome Institute, Department of Energy, Walnut Creek, CA, United States of America
| | - Matthew B Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America.,Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, United States of America
| |
Collapse
|
32
|
White DJ, Wang J, Hall RJ. Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline. J Comput Biol 2017; 24:874-881. [PMID: 28414526 PMCID: PMC5610382 DOI: 10.1089/cmb.2017.0008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Applying high-throughput sequencing to pathogen discovery is a relatively new field, the objective of which is to find disease-causing agents when little or no background information on disease is available. Key steps in the process are the generation of millions of sequence reads from an infected tissue sample, followed by assembly of these reads into longer, contiguous stretches of nucleotide sequences, and then identification of the contigs by matching them to known databases, such as those stored at GenBank or Ensembl. This technique, that is, de novo metagenomics, is particularly useful when the pathogen is viral and strong discriminatory power can be achieved. However, recently, we found that striking differences in results can be achieved when different assemblers were used. In this study, we test formally the impact of five popular assemblers (MIRA, VELVET, METAVELVET, SPADES, and OMEGA) on the detection of a novel virus and assembly of its whole genome in a data set for which we have confirmed the presence of the virus by empirical laboratory techniques, and compare the overall performance between assemblers. Our results show that if results from only one assembler are considered, biologically important reads can easily be overlooked. The impacts of these results on the field of pathogen discovery are considered.
Collapse
Affiliation(s)
| | - Jing Wang
- Institute of Environmental Science and Research at the National Centre for Biosecurity and Infectious Disease, Upper Hutt, New Zealand
| | - Richard J. Hall
- Animal Health Laboratory, Investigation and Diagnostic Centres and Response, Ministry for Primary Industries—Manatū Ahu Matua, Upper Hutt, New Zealand
| |
Collapse
|
33
|
van der Walt AJ, van Goethem MW, Ramond JB, Makhalanyane TP, Reva O, Cowan DA. Assembling metagenomes, one community at a time. BMC Genomics 2017; 18:521. [PMID: 28693474 PMCID: PMC5502489 DOI: 10.1186/s12864-017-3918-9] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 07/02/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metagenomics allows unprecedented access to uncultured environmental microorganisms. The analysis of metagenomic sequences facilitates gene prediction and annotation, and enables the assembly of draft genomes, including uncultured members of a community. However, while several platforms have been developed for this critical step, there is currently no clear framework for the assembly of metagenomic sequence data. RESULTS To assist with selection of an appropriate metagenome assembler we evaluated the capabilities of nine prominent assembly tools on nine publicly-available environmental metagenomes, as well as three simulated datasets. Overall, we found that SPAdes provided the largest contigs and highest N50 values across 6 of the 9 environmental datasets, followed by MEGAHIT and metaSPAdes. MEGAHIT emerged as a computationally inexpensive alternative to SPAdes, assembling the most complex dataset using less than 500 GB of RAM and within 10 hours. CONCLUSIONS We found that assembler choice ultimately depends on the scientific question, the available resources and the bioinformatic competence of the researcher. We provide a concise workflow for the selection of the best assembly tool.
Collapse
Affiliation(s)
- Andries Johannes van der Walt
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028 South Africa
- Centre for Bioinformatics and Computational Biology, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Marc Warwick van Goethem
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028 South Africa
| | - Jean-Baptiste Ramond
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028 South Africa
| | - Thulani Peter Makhalanyane
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028 South Africa
| | - Oleg Reva
- Centre for Bioinformatics and Computational Biology, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Don Arthur Cowan
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028 South Africa
| |
Collapse
|
34
|
Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. MICROBIOME 2017. [PMID: 28683828 DOI: 10.1186/s40168-017-0283-285] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
BACKGROUND Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses. METHODS We have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder's performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014. RESULTS VirFinder had significantly better rates of identifying true viral contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art gene-based virus classification tool, when evaluated with either contigs subsampled from complete genomes or assembled from a simulated human gut metagenome. For example, for contigs subsampled from complete genomes, VirFinder had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb contigs, respectively, at the same false positive rates as VirSorter (0, 0.003, and 0.006, respectively), thus VirFinder works considerably better for small contigs than VirSorter. VirFinder furthermore identified several recently sequenced virus genomes (after 1 January 2014) that VirSorter did not and that have no nucleotide similarity to previously sequenced viruses, demonstrating VirFinder's potential advantage in identifying novel viral sequences. Application of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis patients reveals higher viral diversity in healthy individuals than cirrhosis patients. We also identified contig bins containing crAssphage-like contigs with higher abundance in healthy patients and a putative Veillonella genus prophage associated with cirrhosis patients. CONCLUSIONS This innovative k-mer based tool complements gene-based approaches and will significantly improve prokaryotic viral sequence identification, especially for metagenomic-based studies of viral ecology.
Collapse
Affiliation(s)
- Jie Ren
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Nathan A Ahlgren
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA.
- Present address: Biology Department, Clark University, 950 Main St, Worcester, MA, 01610, USA.
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.
- Center for Computational Systems Biology, Fudan University, 200433, Shanghai, China.
| |
Collapse
|
35
|
Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. MICROBIOME 2017; 5:69. [PMID: 28683828 PMCID: PMC5501583 DOI: 10.1186/s40168-017-0283-5] [Citation(s) in RCA: 368] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 06/05/2017] [Indexed: 05/19/2023]
Abstract
BACKGROUND Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses. METHODS We have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder's performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014. RESULTS VirFinder had significantly better rates of identifying true viral contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art gene-based virus classification tool, when evaluated with either contigs subsampled from complete genomes or assembled from a simulated human gut metagenome. For example, for contigs subsampled from complete genomes, VirFinder had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb contigs, respectively, at the same false positive rates as VirSorter (0, 0.003, and 0.006, respectively), thus VirFinder works considerably better for small contigs than VirSorter. VirFinder furthermore identified several recently sequenced virus genomes (after 1 January 2014) that VirSorter did not and that have no nucleotide similarity to previously sequenced viruses, demonstrating VirFinder's potential advantage in identifying novel viral sequences. Application of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis patients reveals higher viral diversity in healthy individuals than cirrhosis patients. We also identified contig bins containing crAssphage-like contigs with higher abundance in healthy patients and a putative Veillonella genus prophage associated with cirrhosis patients. CONCLUSIONS This innovative k-mer based tool complements gene-based approaches and will significantly improve prokaryotic viral sequence identification, especially for metagenomic-based studies of viral ecology.
Collapse
Affiliation(s)
- Jie Ren
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Nathan A Ahlgren
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA.
- Present address: Biology Department, Clark University, 950 Main St, Worcester, MA, 01610, USA.
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, 3616 Trousdale Pkwy, Los Angeles, CA, 90089, USA
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.
- Center for Computational Systems Biology, Fudan University, 200433, Shanghai, China.
| |
Collapse
|
36
|
Bovo S, Mazzoni G, Ribani A, Utzeri VJ, Bertolini F, Schiavo G, Fontanesi L. A viral metagenomic approach on a non-metagenomic experiment: Mining next generation sequencing datasets from pig DNA identified several porcine parvoviruses for a retrospective evaluation of viral infections. PLoS One 2017; 12:e0179462. [PMID: 28662150 PMCID: PMC5491021 DOI: 10.1371/journal.pone.0179462] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 05/29/2017] [Indexed: 12/14/2022] Open
Abstract
Shot-gun next generation sequencing (NGS) on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads) on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses of the Parvoviridae family: porcine parvovirus 2 (PPV2), PPV4, PPV5 and PPV6 and porcine bocavirus 1-H18 isolate (PBoV1-H18). The presence of these viruses was confirmed by PCR and Sanger sequencing of individual DNA samples. PPV2, PPV4, PPV5, PPV6 and PBoV1-H18 were all identified in samples collected in 1998-2007, 1998-2000, 1997-2000, 1998-2004 and 2003, respectively. For most of these viruses (PPV4, PPV5, PPV6 and PBoV1-H18) previous studies reported their first occurrence much later (from 5 to more than 10 years) than our identification period and in different geographic areas. Our study provided a retrospective evaluation of apparently asymptomatic parvovirus infected pigs providing information that could be important to define occurrence and prevalence of different parvoviruses in South Europe. This study demonstrated the potential of mining NGS datasets non-originally derived by metagenomics experiments for viral metagenomics analyses in a livestock species.
Collapse
Affiliation(s)
- Samuele Bovo
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Biological, Geological, and Environmental Sciences (BiGeA), Biocomputing Group, University of Bologna, Bologna, Italy
| | - Gianluca Mazzoni
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Anisa Ribani
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Valerio Joe Utzeri
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Francesca Bertolini
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Animal Science, Iowa State University, Iowa, United States of America
| | - Giuseppina Schiavo
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Luca Fontanesi
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- * E-mail:
| |
Collapse
|
37
|
Nieuwenhuijse DF, Koopmans MPG. Metagenomic Sequencing for Surveillance of Food- and Waterborne Viral Diseases. Front Microbiol 2017; 8:230. [PMID: 28261185 PMCID: PMC5309255 DOI: 10.3389/fmicb.2017.00230] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 02/01/2017] [Indexed: 12/25/2022] Open
Abstract
A plethora of viruses can be transmitted by the food- and waterborne route. However, their recognition is challenging because of the variety of viruses, heterogeneity of symptoms, the lack of awareness of clinicians, and limited surveillance efforts. Classical food- and waterborne viral disease outbreaks are mainly caused by caliciviruses, but the source of the virus is often not known and the foodborne mode of transmission is difficult to discriminate from human-to-human transmission. Atypical food- and waterborne viral disease can be caused by viruses such as hepatitis A and hepatitis E. In addition, a source of novel emerging viruses with a potential to spread via the food- and waterborne route is the repeated interaction of humans with wildlife. Wildlife-to-human adaptation may give rise to self- limiting outbreaks in some cases, but when fully adjusted to the human host can be devastating. Metagenomic sequencing has been investigated as a promising solution for surveillance purposes as it detects all viruses in a single protocol, delivers additional genomic information for outbreak tracing, and detects novel unknown viruses. Nevertheless, several issues must be addressed to apply metagenomic sequencing in surveillance. First, sample preparation is difficult since the genomic material of viruses is generally overshadowed by host- and bacterial genomes. Second, several data analysis issues hamper the efficient, robust, and automated processing of metagenomic data. Third, interpretation of metagenomic data is hard, because of the lack of general knowledge of the virome in the food chain and the environment. Further developments in virus-specific nucleic acid extraction methods, bioinformatic data processing applications, and unifying data visualization tools are needed to gain insightful surveillance knowledge from suspect food samples.
Collapse
|
38
|
Hesse U, van Heusden P, Kirby BM, Olonade I, van Zyl LJ, Trindade M. Virome Assembly and Annotation: A Surprise in the Namib Desert. Front Microbiol 2017; 8:13. [PMID: 28167933 PMCID: PMC5253355 DOI: 10.3389/fmicb.2017.00013] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 01/03/2017] [Indexed: 11/13/2022] Open
Abstract
Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing latent pitfalls in sequence annotation. Using simulated viromes that mimic environmental data challenges we assessed the performance of five assemblers (CLC-Workbench, IDBA-UD, SPAdes, RayMeta, ABySS). Individual analyses of relevant scaffold length fractions revealed shortcomings of some programs in reconstruction of viral genomes with excessive read coverage (IDBA-UD, RayMeta), and in accurate assembly of scaffolds ≥50 kb (SPAdes, RayMeta, ABySS). The CLC-Workbench assembler performed best in terms of genome recovery (including highly covered genomes) and correct reconstruction of large scaffolds; and was used to assemble a virome from a copper rich site in the Namib Desert. We found that scaffold network analysis and cluster-specific read reassembly improved reconstruction of sequences with excessive read coverage, and that strict data filtering for non-viral sequences prior to downstream analyses was essential. In this study we describe novel viral genomes identified in the Namib Desert copper site virome. Taxonomic affiliations of diverse proteins in the dataset and phylogenetic analyses of circovirus-like proteins indicated links to the marine habitat. Considering additional evidence from this dataset we hypothesize that viruses may have been carried from the Atlantic Ocean into the Namib Desert by fog and wind, highlighting the impact of the extended environment on an investigated niche in metagenome studies.
Collapse
Affiliation(s)
- Uljana Hesse
- Institute for Microbial Biotechnology and Metagenomics, University of the Western CapeBellville, South Africa
- South African National Bioinformatics Institute, University of the Western CapeBellville, South Africa
| | - Peter van Heusden
- South African National Bioinformatics Institute, University of the Western CapeBellville, South Africa
| | - Bronwyn M. Kirby
- Institute for Microbial Biotechnology and Metagenomics, University of the Western CapeBellville, South Africa
| | - Israel Olonade
- Institute for Microbial Biotechnology and Metagenomics, University of the Western CapeBellville, South Africa
| | - Leonardo J. van Zyl
- Institute for Microbial Biotechnology and Metagenomics, University of the Western CapeBellville, South Africa
| | - Marla Trindade
- Institute for Microbial Biotechnology and Metagenomics, University of the Western CapeBellville, South Africa
| |
Collapse
|
39
|
Survey of (Meta)genomic Approaches for Understanding Microbial Community Dynamics. Indian J Microbiol 2016; 57:23-38. [PMID: 28148977 DOI: 10.1007/s12088-016-0629-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 10/27/2016] [Indexed: 01/06/2023] Open
Abstract
Advancement in the next generation sequencing technologies has led to evolution of the field of genomics and metagenomics in a slim duration with nominal cost at precipitous higher rate. While metagenomics and genomics can be separately used to reveal the culture-independent and culture-based microbial evolution, respectively, (meta)genomics together can be used to demonstrate results at population level revealing in-depth complex community interactions for specific ecotypes. The field of metagenomics which started with answering "who is out there?" based on 16S rRNA gene has evolved immensely with the precise organismal reconstruction at species/strain level from the deeply covered metagenome data outweighing the need to isolate bacteria of which 99% are de facto non-cultivable. In this review we have underlined the appeal of metagenomic-derived genomes in providing insights into the evolutionary patterns, growth dynamics, genome/gene-specific sweeps, and durability of environmental pressures. We have demonstrated the use of culture-based genomics and environmental shotgun metagenome data together to elucidate environment specific genome modulations via metagenomic recruitments in terms of gene loss/gain, accessory and core-genome extent. We further illustrated the benefit of (meta)genomics in the understanding of infectious diseases by deducing the relationship between human microbiota and clinical microbiology. This review summarizes the technological advances in the (meta)genomic strategies using the genome and metagenome datasets together to increase the resolution of microbial population studies.
Collapse
|
40
|
Grazziotin AL, Koonin EV, Kristensen DM. Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation. Nucleic Acids Res 2016; 45:D491-D498. [PMID: 27789703 PMCID: PMC5210652 DOI: 10.1093/nar/gkw975] [Citation(s) in RCA: 245] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Revised: 10/08/2016] [Accepted: 10/21/2016] [Indexed: 11/13/2022] Open
Abstract
Viruses are the most abundant and diverse biological entities on earth, and while most of this diversity remains completely unexplored, advances in genome sequencing have provided unprecedented glimpses into the virosphere. The Prokaryotic Virus Orthologous Groups (pVOGs, formerly called Phage Orthologous Groups, POGs) resource has aided in this task over the past decade by using automated methods to keep pace with the rapid increase in genomic data. The uses of pVOGs include functional annotation of viral proteins, identification of genes and viruses in uncharacterized DNA samples, phylogenetic analysis, large-scale comparative genomics projects, and more. The pVOGs database represents a comprehensive set of orthologous gene families shared across multiple complete genomes of viruses that infect bacterial or archaeal hosts (viruses of eukaryotes will be added at a future date). The pVOGs are constructed within the Clusters of Orthologous Groups (COGs) framework that is widely used for orthology identification in prokaryotes. Since the previous release of the POGs, the size has tripled to nearly 3000 genomes and 300 000 proteins, and the number of conserved orthologous groups doubled to 9518. User-friendly webpages are available, including multiple sequence alignments and HMM profiles for each VOG. These changes provide major improvements to the pVOGs database, at a time of rapid advances in virus genomics. The pVOGs database is hosted jointly at the University of Iowa at http://dmk-brain.ecn.uiowa.edu/pVOGs and the NCBI at ftp://ftp.ncbi.nlm.nih.gov/pub/kristensen/pVOGs/home.html.
Collapse
Affiliation(s)
- Ana Laura Grazziotin
- Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - David M Kristensen
- Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA 52242, USA .,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
41
|
Dann LM, Rosales S, McKerral J, Paterson JS, Smith RJ, Jeffries TC, Oliver RL, Mitchell JG. Marine and giant viruses as indicators of a marine microbial community in a riverine system. Microbiologyopen 2016; 5:1071-1084. [PMID: 27506856 PMCID: PMC5221468 DOI: 10.1002/mbo3.392] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 06/13/2016] [Accepted: 06/17/2016] [Indexed: 12/30/2022] Open
Abstract
Viral communities are important for ecosystem function as they are involved in critical biogeochemical cycles and controlling host abundance. This study investigates riverine viral communities around a small rural town that influences local water inputs. Myoviridae, Siphoviridae, Phycodnaviridae, Mimiviridae, Herpesviridae, and Podoviridae were the most abundant families. Viral species upstream and downstream of the town were similar, with Synechoccocus phage, salinus, Prochlorococcus phage, Mimivirus A, and Human herpes 6A virus most abundant, contributing to 4.9-38.2% of average abundance within the metagenomic profiles, with Synechococcus and Prochlorococcus present in metagenomes as the expected hosts for the phage. Overall, the majority of abundant viral species were or were most similar to those of marine origin. At over 60 km to the river mouth, the presence of marine communities provides some support for the Baas-Becking hypothesis "everything is everywhere, but, the environment selects." We conclude marine microbial species may occur more frequently in freshwater systems than previously assumed, and hence may play important roles in some freshwater ecosystems within tens to a hundred kilometers from the sea.
Collapse
Affiliation(s)
- Lisa M Dann
- School of Biological Sciences at Flinders University, Adelaide, South Australia, Australia
| | - Stephanie Rosales
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
| | - Jody McKerral
- School of Computer Science, Engineering and Mathematics, Flinders University, Adelaide, Australia
| | - James S Paterson
- School of Biological Sciences at Flinders University, Adelaide, South Australia, Australia
| | - Renee J Smith
- School of Biological Sciences at Flinders University, Adelaide, South Australia, Australia
| | - Thomas C Jeffries
- Hawkesbury Institute for the Environment, Western Sydney University, Penrith, New South Wales, Australia
| | - Rod L Oliver
- Land and Water Research Division at the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Adelaide, South Australia, Australia
| | - James G Mitchell
- School of Biological Sciences at Flinders University, Adelaide, South Australia, Australia
| |
Collapse
|
42
|
Hiraoka S, Yang CC, Iwasaki W. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond. Microbes Environ 2016; 31:204-12. [PMID: 27383682 PMCID: PMC5017796 DOI: 10.1264/jsme2.me16024] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.
Collapse
Affiliation(s)
- Satoshi Hiraoka
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, the University of Tokyo
| | | | | |
Collapse
|
43
|
Tangherlini M, Dell'Anno A, Zeigler Allen L, Riccioni G, Corinaldesi C. Assessing viral taxonomic composition in benthic marine ecosystems: reliability and efficiency of different bioinformatic tools for viral metagenomic analyses. Sci Rep 2016; 6:28428. [PMID: 27329207 PMCID: PMC4916513 DOI: 10.1038/srep28428] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 06/02/2016] [Indexed: 11/09/2022] Open
Abstract
In benthic deep-sea ecosystems, which represent the largest biome on Earth, viruses have a recognised key ecological role, but their diversity is still largely unknown. Identifying the taxonomic composition of viruses is crucial for understanding virus-host interactions, their role in food web functioning and evolutionary processes. Here, we compared the performance of various bioinformatic tools (BLAST, MG-RAST, NBC, VMGAP, MetaVir, VIROME) for analysing the viral taxonomic composition in simulated viromes and viral metagenomes from different benthic deep-sea ecosystems. The analyses of simulated viromes indicate that all the BLAST tools, followed by MetaVir and VMGAP, are more reliable in the affiliation of viral sequences and strains. When analysing the environmental viromes, tBLASTx, MetaVir, VMGAP and VIROME showed a similar efficiency of sequence annotation; however, MetaVir and tBLASTx identified a higher number of viral strains. These latter tools also identified a wider range of viral families than the others, providing a wider view of viral taxonomic diversity in benthic deep-sea ecosystems. Our findings highlight strengths and weaknesses of available bioinformatic tools for investigating the taxonomic diversity of viruses in benthic ecosystems in order to improve our comprehension of viral diversity in the oceans and its relationships with host diversity and ecosystem functioning.
Collapse
Affiliation(s)
- M Tangherlini
- Department of Environmental and Life Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| | - A Dell'Anno
- Department of Environmental and Life Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| | - L Zeigler Allen
- Microbial and Environmental Genomics, J Craig Venter Institute, San Diego, CA, USA
| | - G Riccioni
- Department of Environmental and Life Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| | - C Corinaldesi
- Department of Environmental and Life Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| |
Collapse
|
44
|
Laffy PW, Wood-Charlson EM, Turaev D, Weynberg KD, Botté ES, van Oppen MJH, Webster NS, Rattei T. HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts. Front Microbiol 2016; 7:822. [PMID: 27375564 PMCID: PMC4899465 DOI: 10.3389/fmicb.2016.00822] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 05/16/2016] [Indexed: 11/13/2022] Open
Abstract
Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments.
Collapse
Affiliation(s)
- Patrick W. Laffy
- Australian Institute of Marine ScienceTownsville, QLD, Australia
| | - Elisha M. Wood-Charlson
- Center for Microbial Oceanography: Research and Education, University of Hawai‘i at MānoaHonolulu, HI, USA
| | - Dmitrij Turaev
- Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of ViennaVienna, Austria
| | | | | | - Madeleine J. H. van Oppen
- Australian Institute of Marine ScienceTownsville, QLD, Australia
- School of Biosciences, University of MelbourneMelbourne, VIC, Australia
| | | | - Thomas Rattei
- Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of ViennaVienna, Austria
| |
Collapse
|
45
|
Gupta A, Kumar S, Prasoodanan VPK, Harish K, Sharma AK, Sharma VK. Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes. Front Microbiol 2016; 7:469. [PMID: 27148174 PMCID: PMC4828583 DOI: 10.3389/fmicb.2016.00469] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 03/21/2016] [Indexed: 11/13/2022] Open
Abstract
Several metagenomic projects have been accomplished or are in progress. However, in most cases, it is not feasible to generate complete genomic assemblies of species from the metagenomic sequencing of a complex environment. Only a few studies have reported the reconstruction of bacterial genomes from complex metagenomes. In this work, Binning-Assembly approach has been proposed and demonstrated for the reconstruction of bacterial and viral genomes from 72 human gut metagenomic datasets. A total 1156 bacterial genomes belonging to 219 bacterial families and, 279 viral genomes belonging to 84 viral families could be identified. More than 80% complete draft genome sequences could be reconstructed for a total of 126 bacterial and 11 viral genomes. Selected draft assembled genomes could be validated with 99.8% accuracy using their ORFs. The study provides useful information on the assembly expected for a species given its number of reads and abundance. This approach along with spiking was also demonstrated to be useful in improving the draft assembly of a bacterial genome. The Binning-Assembly approach can be successfully used to reconstruct bacterial and viral genomes from multiple metagenomic datasets obtained from similar environments.
Collapse
Affiliation(s)
- Ankit Gupta
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| | - Sanjiv Kumar
- Department of Medicine, University of Connecticut Health Center Farmington, CT, USA
| | - Vishnu P K Prasoodanan
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| | - K Harish
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| | - Ashok K Sharma
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| | - Vineet K Sharma
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| |
Collapse
|
46
|
Sangwan N, Xia F, Gilbert JA. Recovering complete and draft population genomes from metagenome datasets. MICROBIOME 2016; 4:8. [PMID: 26951112 PMCID: PMC4782286 DOI: 10.1186/s40168-016-0154-5] [Citation(s) in RCA: 159] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Accepted: 02/05/2016] [Indexed: 05/03/2023]
Abstract
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.
Collapse
Affiliation(s)
- Naseer Sangwan
- Biosciences Division (BIO), Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL, 60439, USA.
- Department of Surgery, University of Chicago, 5841 South Maryland Avenue, MC 5029, Chicago, IL, 60637, USA.
| | - Fangfang Xia
- Computing, Environment and Life Sciences, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL, 60439, USA.
| | - Jack A Gilbert
- Biosciences Division (BIO), Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL, 60439, USA.
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th Street, Chicago, IL, 60637, USA.
- Department of Surgery, University of Chicago, 5841 South Maryland Avenue, MC 5029, Chicago, IL, 60637, USA.
- Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA, 02543, USA.
| |
Collapse
|
47
|
A diarrheic chicken simultaneously co-infected with multiple picornaviruses: Complete genome analysis of avian picornaviruses representing up to six genera. Virology 2016; 489:63-74. [DOI: 10.1016/j.virol.2015.12.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Revised: 11/24/2015] [Accepted: 12/03/2015] [Indexed: 12/23/2022]
|
48
|
Abstract
The characterization of the human blood-associated viral community (also called blood virome) is essential for epidemiological surveillance and to anticipate new potential threats for blood transfusion safety. Currently, the risk of blood-borne agent transmission of well-known viruses (HBV, HCV, HIV and HTLV) can be considered as under control in high-resource countries. However, other viruses unknown or unsuspected may be transmitted to recipients by blood-derived products. This is particularly relevant considering that a significant proportion of transfused patients are immunocompromised and more frequently subjected to fatal outcomes. Several measures to prevent transfusion transmission of unknown viruses have been implemented including the exclusion of at-risk donors, leukocyte reduction of donor blood, and physicochemical treatment of the different blood components. However, up to now there is no universal method for pathogen inactivation, which would be applicable for all types of blood components and, equally effective for all viral families. In addition, among available inactivation procedures of viral genomes, some of them are recognized to be less effective on non-enveloped viruses, and inadequate to inactivate higher viral titers in plasma pools or derivatives. Given this, there is the need to implement new methodologies for the discovery of unknown viruses that may affect blood transfusion. Viral metagenomics combined with High Throughput Sequencing appears as a promising approach for the identification and global surveillance of new and/or unexpected viruses that could impair blood transfusion safety.
Collapse
Affiliation(s)
- V Sauvage
- Département d'études des agents transmissibles par le sang, Institut national de la transfusion sanguine (INTS), Centre national de référence des hépatites virales B et C et du VIH en transfusion, 75015 Paris, France.
| | - M Eloit
- PathoQuest, bâtiment François-Jacob, 25, rue du Dr-Roux, 75015 Paris, France; Inserm U1117, Biology of Infection Unit, Laboratory of Pathogen Discovery, Institut Pasteur, 28, rue du Docteur-Roux, 75724 Paris, France
| |
Collapse
|
49
|
Ramazzotti M, Berná L, Donati C, Cavalieri D. riboFrame: An Improved Method for Microbial Taxonomy Profiling from Non-Targeted Metagenomics. Front Genet 2015; 6:329. [PMID: 26635865 PMCID: PMC4646959 DOI: 10.3389/fgene.2015.00329] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/30/2015] [Indexed: 02/01/2023] Open
Abstract
Non-targeted metagenomics offers the unprecedented possibility of simultaneously investigate the microbial profile and the genetic capabilities of a sample by a direct analysis of its entire DNA content. The assessment of the microbial taxonomic composition is frequently obtained by mapping reads to genomic databases that, although growing, are still limited and biased. Here we present riboFrame, a novel procedure for microbial profiling based on the identification and classification of 16S rDNA sequences in non-targeted metagenomics datasets. Reads overlapping the 16S rDNA genes are identified using Hidden Markov Models and a taxonomic assignment is obtained by naïve Bayesian classification. All reads identified as ribosomal are coherently positioned in the 16S rDNA gene, allowing the use of the topology of the gene (i.e., the secondary structure and the location of variable regions) to guide the abundance analysis. We tested and verified the effectiveness of our method on simulated ribosomal data, on simulated metagenomes and on a real dataset. riboFrame exploits the taxonomic potentialities of the 16S rDNA gene in the context of non-targeted metagenomics, giving an accurate perspective on the microbial profile in metagenomic samples.
Collapse
Affiliation(s)
- Matteo Ramazzotti
- Dipartimento di Scienze Biomediche Sperimentali e Cliniche, Università degli Studi di Firenze Firenze, Italy
| | - Luisa Berná
- Unidad de Biología Molecular, Institut Pasteur de Montevideo Montevideo, Uruguay
| | - Claudio Donati
- Centre for Research and Innovation, Fondazione Edmund Mach San Michele all'Adige, Italy
| | - Duccio Cavalieri
- Centre for Research and Innovation, Fondazione Edmund Mach San Michele all'Adige, Italy
| |
Collapse
|
50
|
Genome-wide identification and characterization of reference genes with different transcript abundances for Streptomyces coelicolor. Sci Rep 2015; 5:15840. [PMID: 26527303 PMCID: PMC4630627 DOI: 10.1038/srep15840] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 10/01/2015] [Indexed: 12/22/2022] Open
Abstract
The lack of reliable reference genes (RGs) in the genus Streptomyces hampers effort to obtain the precise data of transcript levels. To address this issue, we aimed to identify reliable RGs in the model organism Streptomyces coelicolor. A pool of potential RGs containing 1,471 genes was first identified by determining the intersection of genes with stable transcript levels from four time-series transcriptome microarray datasets of S. coelicolor M145 cultivated in different conditions. Then, following a strict rational selection scheme including homology analysis, disturbance analysis, function analysis and transcript abundance analysis, 13 candidates were selected from the 1,471 genes. Based on real-time quantitative reverse transcription PCR assays, SCO0710, SCO6185, SCO1544, SCO3183 and SCO4758 were identified as the top five genes with the most stable transcript levels among the 13 candidates. Further analyses showed these five genes also maintained stable transcript levels in different S. coelicolor strains, as well as in Streptomyces avermitilis MA-4680 and Streptomyces clavuligerus NRRL 3585, suggesting they could fulfill the requirements of accurate data normalization in streptomycetes. Moreover, the systematic strategy employed in this work could be used for reference in other microorganism to select reliable RGs.
Collapse
|