1
|
Kang X, Luo X, Schönhuth A. StrainXpress: strain aware metagenome assembly from short reads. Nucleic Acids Res 2022; 50:e101. [PMID: 35776122 PMCID: PMC9508831 DOI: 10.1093/nar/gkac543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 05/27/2022] [Accepted: 06/30/2022] [Indexed: 12/05/2022] Open
Abstract
Next-generation sequencing–based metagenomics has enabled to identify microorganisms in characteristic habitats without the need for lengthy cultivation. Importantly, clinically relevant phenomena such as resistance to medication, virulence or interactions with the environment can vary already within species. Therefore, a major current challenge is to reconstruct individual genomes from the sequencing reads at the level of strains, and not just the level of species. However, strains of one species can differ only by minor amounts of variants, which makes it difficult to distinguish them. Despite considerable recent progress, related approaches have remained fragmentary so far. Here, we present StrainXpress, as a comprehensive solution to the problem of strain aware metagenome assembly from next-generation sequencing reads. In experiments, StrainXpress reconstructs strain-specific genomes from metagenomes that involve up to >1000 strains and proves to successfully deal with poorly covered strains. The amount of reconstructed strain-specific sequence exceeds that of the current state-of-the-art approaches by on average 26.75% across all data sets (first quartile: 18.51%, median: 26.60%, third quartile: 35.05%).
Collapse
Affiliation(s)
- Xiongbin Kang
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany
| | - Xiao Luo
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany
| |
Collapse
|
2
|
Luo X, Kang X, Schönhuth A. Enhancing Long-Read-Based Strain-Aware Metagenome Assembly. Front Genet 2022; 13:868280. [PMID: 35646097 PMCID: PMC9136235 DOI: 10.3389/fgene.2022.868280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 04/01/2022] [Indexed: 11/18/2022] Open
Abstract
Microbial communities are usually highly diverse and often involve multiple strains from the participating species due to the rapid evolution of microorganisms. In such a complex microecosystem, different strains may show different biological functions. While reconstruction of individual genomes at the strain level is vital for accurately deciphering the composition of microbial communities, the problem has largely remained unresolved so far. Next-generation sequencing has been routinely used in metagenome assembly but there have been struggles to generate strain-specific genome sequences due to the short-read length. This explains why long-read sequencing technologies have recently provided unprecedented opportunities to carry out haplotype- or strain-resolved genome assembly. Here, we propose MetaBooster and MetaBooster-HiFi, as two pipelines for strain-aware metagenome assembly from PacBio CLR and Oxford Nanopore long-read sequencing data. Benchmarking experiments on both simulated and real sequencing data demonstrate that either the MetaBooster or the MetaBooster-HiFi pipeline drastically outperforms the state-of-the-art de novo metagenome assemblers, in terms of all relevant metagenome assembly criteria, involving genome fraction, contig length, and error rates.
Collapse
Affiliation(s)
- Xiao Luo
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Life Science and Health, Centrum Wiskunde and Informatica, Amsterdam, Netherlands
| | - Xiongbin Kang
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Life Science and Health, Centrum Wiskunde and Informatica, Amsterdam, Netherlands
| |
Collapse
|
3
|
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021; 22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]
Abstract
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Collapse
|
4
|
Ayling M, Clark MD, Leggett RM. New approaches for metagenome assembly with short reads. Brief Bioinform 2021; 21:584-594. [PMID: 30815668 PMCID: PMC7299287 DOI: 10.1093/bib/bbz020] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 01/31/2019] [Accepted: 02/01/2019] [Indexed: 02/07/2023] Open
Abstract
In recent years, the use of longer range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic data sets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.
Collapse
Affiliation(s)
- Martin Ayling
- Earlham Institute, Norwich Research Park, Norwich, UK
| | | | | |
Collapse
|
5
|
Balvert M, Luo X, Hauptfeld E, Schönhuth A, Dutilh BE. OGRE: Overlap Graph-based metagenomic Read clustEring. Bioinformatics 2021; 37:905-912. [PMID: 32871010 PMCID: PMC8128468 DOI: 10.1093/bioinformatics/btaa760] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 08/19/2020] [Accepted: 08/25/2020] [Indexed: 11/13/2022] Open
Abstract
Motivation The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. Results We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. Conclusion OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Availabilityand implementation Code is made available on Github (https://github.com/Marleen1/OGRE). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marleen Balvert
- Life Sciences & Health, Centrum Wiskunde & Informatica, Amsterdam 1098 XG, The Netherlands.,Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands.,Department of Econometrics & Operations Research, Tilburg University, Tilburg 5000 LE, The Netherlands
| | - Xiao Luo
- Life Sciences & Health, Centrum Wiskunde & Informatica, Amsterdam 1098 XG, The Netherlands
| | - Ernestina Hauptfeld
- Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands.,Laboratorium of Microbiology, Wageningen University & Research, Wageningen 6700 HB, The Netherlands
| | - Alexander Schönhuth
- Life Sciences & Health, Centrum Wiskunde & Informatica, Amsterdam 1098 XG, The Netherlands.,Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands
| | - Bas E Dutilh
- Theoretical Biology & Bioinformatics, Utrecht University, Utrecht 3512 JE, The Netherlands
| |
Collapse
|
6
|
Lapidus AL, Korobeynikov AI. Metagenomic Data Assembly - The Way of Decoding Unknown Microorganisms. Front Microbiol 2021; 12:613791. [PMID: 33833738 PMCID: PMC8021871 DOI: 10.3389/fmicb.2021.613791] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Accepted: 03/03/2021] [Indexed: 01/08/2023] Open
Abstract
Metagenomics is a segment of conventional microbial genomics dedicated to the sequencing and analysis of combined genomic DNA of entire environmental samples. The most critical step of the metagenomic data analysis is the reconstruction of individual genes and genomes of the microorganisms in the communities using metagenomic assemblers - computational programs that put together small fragments of sequenced DNA generated by sequencing instruments. Here, we describe the challenges of metagenomic assembly, a wide spectrum of applications in which metagenomic assemblies were used to better understand the ecology and evolution of microbial ecosystems, and present one of the most efficient microbial assemblers, SPAdes that was upgraded to become applicable for metagenomics.
Collapse
Affiliation(s)
- Alla L. Lapidus
- Center for Algorithmic Biotechnology, St. Petersburg State University, Saint Petersburg, Russia
| | | |
Collapse
|
7
|
Deng Z, Delwart E. ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data. BMC Bioinformatics 2021; 22:119. [PMID: 33706720 PMCID: PMC7953547 DOI: 10.1186/s12859-021-04038-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 02/21/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs. RESULTS To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets. CONCLUSIONS A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.
Collapse
Affiliation(s)
- Zachary Deng
- Vitalant Research Institute, San Francisco, CA, 94118, USA.
- Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA, 94107, USA.
| | - Eric Delwart
- Vitalant Research Institute, San Francisco, CA, 94118, USA.
- Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA, 94107, USA.
| |
Collapse
|
8
|
Prayogo FA, Budiharjo A, Kusumaningrum HP, Wijanarka W, Suprihadi A, Nurhayati N. Metagenomic applications in exploration and development of novel enzymes from nature: a review. J Genet Eng Biotechnol 2020; 18:39. [PMID: 32749574 PMCID: PMC7403272 DOI: 10.1186/s43141-020-00043-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 06/10/2020] [Indexed: 12/14/2022]
Abstract
BACKGROUND Microbial community has an essential role in various fields, especially the industrial sector. Microbes produce metabolites in the form of enzymes, which are one of the essential compounds for industrial processes. Unfortunately, there are still numerous microbes that cannot be identified and cultivated because of the limitations of the culture-based method. The metagenomic approach is a solution for researchers to overcome these problems. Metagenomics is a strategy used to analyze the genomes of microbial communities in the environment directly. Metagenomics application used to explore novel enzymes is essential because it allows researchers to obtain data on microbial diversity, reaching of 99% and various types of genes encoding an enzyme that has not yet been identified. Basic methods in metagenomics have been developed and are commonly used in various studies. A basic understanding of metagenomics for researchers is needed, especially young researchers to support the success of the research. SHORT CONCLUSION Therefore, this review was done in order to provide a deep understanding of metagenomics. It also discussed the application and basic methods of metagenomics in the exploration of novel enzymes, especially in the latest research. Several types of enzymes, such as cellulases, proteases, and lipases, which have been explored using metagenomics, were reviewed in this article.
Collapse
Affiliation(s)
- Fitra Adi Prayogo
- Department of Biology, Faculty of Science and Mathematics, Diponegoro University, Semarang City, 50275 Indonesia
| | - Anto Budiharjo
- Biotechnology Study Program, Faculty of Science and Mathematics, Diponegoro University, Jl. Prof. Sudharto SH, Semarang, 50275 Indonesia
- Molecular and Applied Microbiology Laboratory, Center Central Laboratory of Research and Service - Diponegoro University, Jl. Prof. Sudharto SH, Semarang, 50275 Indonesia
| | | | - Wijanarka Wijanarka
- Biotechnology Study Program, Faculty of Science and Mathematics, Diponegoro University, Jl. Prof. Sudharto SH, Semarang, 50275 Indonesia
| | - Agung Suprihadi
- Biotechnology Study Program, Faculty of Science and Mathematics, Diponegoro University, Jl. Prof. Sudharto SH, Semarang, 50275 Indonesia
| | - Nurhayati Nurhayati
- Biotechnology Study Program, Faculty of Science and Mathematics, Diponegoro University, Jl. Prof. Sudharto SH, Semarang, 50275 Indonesia
| |
Collapse
|
9
|
Baaijens JA, Schönhuth A. Overlap graph-based generation of haplotigs for diploids and polyploids. Bioinformatics 2020; 35:4281-4289. [PMID: 30994902 DOI: 10.1093/bioinformatics/btz255] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 03/18/2019] [Accepted: 04/11/2019] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Haplotype-aware genome assembly plays an important role in genetics, medicine and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity. RESULTS We present POLYploid genome fitTEr (POLYTE) as a new approach to de novo generation of haplotigs for diploid and polyploid genomes of known ploidy. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence. As a consequence, POLYTE outperforms state-of-the-art approaches in various relevant aspects, where advantages become particularly distinct in polyploid settings. AVAILABILITY AND IMPLEMENTATION POLYTE is freely available as part of the HaploConduct package at https://github.com/HaploConduct/HaploConduct, implemented in Python and C++. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Alexander Schönhuth
- Centrum Wiskunde & Informatica, XG Amsterdam, The Netherlands.,Theoretical Biology and Bioinformatics, Utrecht University, CH Utrecht, The Netherlands
| |
Collapse
|
10
|
Temperature and Nutrient Levels Correspond with Lineage-Specific Microdiversification in the Ubiquitous and Abundant Freshwater Genus Limnohabitans. Appl Environ Microbiol 2020; 86:AEM.00140-20. [PMID: 32169939 DOI: 10.1128/aem.00140-20] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 03/10/2020] [Indexed: 11/20/2022] Open
Abstract
Most freshwater bacterial communities are characterized by a few dominant taxa that are often ubiquitous across freshwater biomes worldwide. Our understanding of the genomic diversity within these taxonomic groups is limited to a subset of taxa. Here, we investigated the genomic diversity that enables Limnohabitans, a freshwater genus key in funneling carbon from primary producers to higher trophic levels, to achieve abundance and ubiquity. We reconstructed eight putative Limnohabitans metagenome-assembled genomes (MAGs) from stations located along broad environmental gradients existing in Lake Michigan, part of Earth's largest surface freshwater system. De novo strain inference analysis resolved a total of 23 strains from these MAGs, which strongly partitioned into two habitat-specific clusters with cooccurring strains from different lineages. The largest number of strains belonged to the abundant LimB lineage, for which robust in situ strain delineation had not previously been achieved. Our data show that temperature and nutrient levels may be important environmental parameters associated with microdiversification within the Limnohabitans genus. In addition, strains predominant in low- and high-phosphorus conditions had larger genomic divergence than strains abundant under different temperatures. Comparative genomics and gene expression analysis yielded evidence for the ability of LimB populations to exhibit cellular motility and chemotaxis, a phenotype not yet associated with available Limnohabitans isolates. Our findings broaden historical marker gene-based surveys of Limnohabitans microdiversification and provide in situ evidence of genome diversity and its functional implications across freshwater gradients.IMPORTANCE Limnohabitans is an important bacterial taxonomic group for cycling carbon in freshwater ecosystems worldwide. Here, we examined the genomic diversity of different Limnohabitans lineages. We focused on the LimB lineage of this genus, which is globally distributed and often abundant, and its abundance has shown to be largely invariant to environmental change. Our data show that the LimB lineage is actually comprised of multiple cooccurring populations for which the composition and genomic characteristics are associated with variations in temperature and nutrient levels. The gene expression profiles of this lineage suggest the importance of chemotaxis and motility, traits that had not yet been associated with the Limnohabitans genus, in adapting to environmental conditions.
Collapse
|
11
|
Dovrolis N, Kolios G, Spyrou GM, Maroulakou I. Computational profiling of the gut-brain axis: microflora dysbiosis insights to neurological disorders. Brief Bioinform 2020; 20:825-841. [PMID: 29186317 DOI: 10.1093/bib/bbx154] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 12/14/2022] Open
Abstract
Almost 2500 years after Hippocrates' observations on health and its direct association to the gastrointestinal tract, a paradigm shift has recently occurred, making the gut and its symbionts (bacteria, fungi, archaea and viruses) a point of convergence for studies. It is nowadays well established that the gut microflora's compositional diversity regulates via its genes (the microbiome) the host's health and provides preliminary insights into disease progression and regulation. The microbiome's involvement is evident in immunological and physiological studies that link changes in its biodiversity to its contributions to the host's phenotype but also in neurological investigations, substantiating the aptly named gut-brain axis. The definitive mechanisms of this last bidirectional interaction will be our main focus because it presents researchers with a new conundrum. In this review, we prospect current literature for computational analysis methodologies that accommodate the need for better understanding of the microbiome-gut-brain interactions and neurological disorder onset and progression, through cross-disciplinary systems biology applications. We will present bioinformatics tools used in exploring these synergies that help build and interpret microbial 16S ribosomal RNA data sets, produced by shotgun and high-throughput sequencing of healthy and neurological disorder samples stored in biological databases. These approaches provide alternative means for researchers to form hypotheses to their inquests faster, cheaper and swith precision. The goal of these studies relies on the integration of combined metagenomics and metabolomics assessments. An accurate characterization of the microbiome and its functionality can support new diagnostic, prognostic and therapeutic strategies for neurological disorders, customized for each individual host.
Collapse
|
12
|
Guo J, Quensen JF, Sun Y, Wang Q, Brown CT, Cole JR, Tiedje JM. Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes. Front Genet 2019; 10:957. [PMID: 31749830 PMCID: PMC6843070 DOI: 10.3389/fgene.2019.00957] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 09/09/2019] [Indexed: 12/28/2022] Open
Abstract
Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity, and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth.
Collapse
Affiliation(s)
- Jiarong Guo
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - John F. Quensen
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - Yanni Sun
- Department of Electronical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Qiong Wang
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - C. Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, Davis, CA, United States
| | - James R. Cole
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - James M. Tiedje
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
13
|
Abstract
Computer-assisted technologies of the genomic structure, biological function, and evolution of viruses remain a largely neglected area of research. The attention of bioinformaticians to this challenging field is currently unsatisfying in respect to its medical and biological importance. The power of new genome sequencing technologies, associated with new tools to handle "big data", provides unprecedented opportunities to address fundamental questions in virology. Here, we present an overview of the current technologies, challenges, and advantages of Next-Generation Sequencing (NGS) in relation to the field of virology. We present how viral sequences can be detected de novo out of current short-read NGS data. Furthermore, we discuss the challenges and applications of viral quasispecies and how secondary structures, commonly shaped by RNA viruses, can be computationally predicted. The phylogenetic analysis of viruses, as another ubiquitous field in virology, forms an essential element of describing viral epidemics and challenges current algorithms. Recently, the first specialized virus-bioinformatic organizations have been established. We need to bring together virologists and bioinformaticians and provide a platform for the implementation of interdisciplinary collaborative projects at local and international scales. Above all, there is an urgent need for dedicated software tools to tackle various challenges in virology.
Collapse
Affiliation(s)
- Martin Hölzer
- RNA Bioinformatics and High Throughput Analysis, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany; European Virus Bioinformatics Center (EVBC), Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High Throughput Analysis, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany; European Virus Bioinformatics Center (EVBC), Jena, Germany; FLI Leibniz Institute for Age Research, Jena, Germany.
| |
Collapse
|