51
|
Wang Z, Wang Z, Lu YY, Sun F, Zhu S. SolidBin: improving metagenome binning with semi-supervised normalized cut. Bioinformatics 2020; 35:4229-4238. [PMID: 30977806 DOI: 10.1093/bioinformatics/btz253] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 03/14/2019] [Accepted: 04/05/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Metagenomic contig binning is an important computational problem in metagenomic research, which aims to cluster contigs from the same genome into the same group. Unlike classical clustering problem, contig binning can utilize known relationships among some of the contigs or the taxonomic identity of some contigs. However, the current state-of-the-art contig binning methods do not make full use of the additional biological information except the coverage and sequence composition of the contigs. RESULTS We developed a novel contig binning method, Semi-supervised Spectral Normalized Cut for Binning (SolidBin), based on semi-supervised spectral clustering. Using sequence feature similarity and/or additional biological information, such as the reliable taxonomy assignments of some contigs, SolidBin constructs two types of prior information: must-link and cannot-link constraints. Must-link constraints mean that the pair of contigs should be clustered into the same group, while cannot-link constraints mean that the pair of contigs should be clustered in different groups. These constraints are then integrated into a classical spectral clustering approach, normalized cut, for improved contig binning. The performance of SolidBin is compared with five state-of-the-art genome binners, CONCOCT, COCACOLA, MaxBin, MetaBAT and BMC3C on five next-generation sequencing benchmark datasets including simulated multi- and single-sample datasets and real multi-sample datasets. The experimental results show that, SolidBin has achieved the best performance in terms of F-score, Adjusted Rand Index and Normalized Mutual Information, especially while using the real datasets and the single-sample dataset. AVAILABILITY AND IMPLEMENTATION https://github.com/sufforest/SolidBin. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziye Wang
- Centre for Computational Systems Biology, School of Mathematical Sciences, Shanghai, China.,School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Zhengyang Wang
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Shanghai, China
| | - Yang Young Lu
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Fengzhu Sun
- Centre for Computational Systems Biology, School of Mathematical Sciences, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Shanfeng Zhu
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China
| |
Collapse
|
52
|
Linard B, Swenson K, Pardi F. Rapid alignment-free phylogenetic identification of metagenomic sequences. Bioinformatics 2020; 35:3303-3312. [PMID: 30698645 DOI: 10.1093/bioinformatics/btz068] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 01/18/2019] [Accepted: 01/29/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Taxonomic classification is at the core of environmental DNA analysis. When a phylogenetic tree can be built as a prior hypothesis to such classification, phylogenetic placement (PP) provides the most informative type of classification because each query sequence is assigned to its putative origin in the tree. This is useful whenever precision is sought (e.g. in diagnostics). However, likelihood-based PP algorithms struggle to scale with the ever-increasing throughput of DNA sequencing. RESULTS We have developed RAPPAS (Rapid Alignment-free Phylogenetic Placement via Ancestral Sequences) which uses an alignment-free approach, removing the hurdle of query sequence alignment as a preliminary step to PP. Our approach relies on the precomputation of a database of k-mers that may be present with non-negligible probability in relatives of the reference sequences. The placement is performed by inspecting the stored phylogenetic origins of the k-mers in the query, and their probabilities. The database can be reused for the analysis of several different metagenomes. Experiments show that the first implementation of RAPPAS is already faster than competing likelihood-based PP algorithms, while keeping similar accuracy for short reads. RAPPAS scales PP for the era of routine metagenomic diagnostics. AVAILABILITY AND IMPLEMENTATION Program and sources freely available for download at https://github.com/blinard-BIOINFO/RAPPAS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin Linard
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,ISEM, University of Montpellier, CNRS, IRD, EPHE, CIRAD, INRAP, Montpellier, France.,AGAP, University of Montpellier, CIRAD, INRA, Montpellier Supagro, Montpellier, France
| | - Krister Swenson
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle, Montpellier, France
| | - Fabio Pardi
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle, Montpellier, France
| |
Collapse
|
53
|
Shang J, Sun Y. CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning. Methods 2020; 189:95-103. [PMID: 32454212 PMCID: PMC7255349 DOI: 10.1016/j.ymeth.2020.05.018] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 05/05/2020] [Accepted: 05/17/2020] [Indexed: 02/07/2023] Open
Abstract
The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. By combining k-mer embedding-based encoding, hierarchically organized CNNs, and carefully trained rejection layer, CHEER is able to assign correct taxonomic labels for reads from new species. We tested CHEER on both simulated and real sequencing data. The results show that CHEER can achieve higher accuracy than popular alignment-based and alignment-free taxonomic assignment tools. The source code, scripts, and pre-trained parameters for CHEER are available via GitHub:https://github.com/KennthShang/CHEER.
Collapse
Affiliation(s)
- Jiayu Shang
- Electrical Engineering Dept., City University of Hong Kong, Kowloon, Hong Kong Special Administrative Region
| | - Yanni Sun
- Electrical Engineering Dept., City University of Hong Kong, Kowloon, Hong Kong Special Administrative Region.
| |
Collapse
|
54
|
Levy Karin E, Mirdita M, Söding J. MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. MICROBIOME 2020; 8:48. [PMID: 32245390 PMCID: PMC7126354 DOI: 10.1186/s40168-020-00808-x] [Citation(s) in RCA: 136] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 02/14/2020] [Indexed: 05/10/2023]
Abstract
BACKGROUND Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, and geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity of organisms without the need for prior cultivation. Unicellular eukaryotes play essential roles in most microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts, and parasites to plants and animals. Investigating their roles is therefore of great interest to ecology, biotechnology, human health, and evolution. However, the generally lower sequencing coverage, their more complex gene and genome architectures, and a lack of eukaryote-specific experimental and computational procedures have kept them on the sidelines of metagenomics. RESULTS MetaEuk is a toolkit for high-throughput, reference-based discovery, and annotation of protein-coding genes in eukaryotic metagenomic contigs. It performs fast searches with 6-frame-translated fragments covering all possible exons and optimally combines matches into multi-exon proteins. We used a benchmark of seven diverse, annotated genomes to show that MetaEuk is highly sensitive even under conditions of low sequence similarity to the reference database. To demonstrate MetaEuk's power to discover novel eukaryotic proteins in large-scale metagenomic data, we assembled contigs from 912 samples of the Tara Oceans project. MetaEuk predicted >12,000,000 protein-coding genes in 8 days on ten 16-core servers. Most of the discovered proteins are highly diverged from known proteins and originate from very sparsely sampled eukaryotic supergroups. CONCLUSION The open-source (GPLv3) MetaEuk software (https://github.com/soedinglab/metaeuk) enables large-scale eukaryotic metagenomics through reference-based, sensitive taxonomic and functional annotation. Video abstract.
Collapse
Affiliation(s)
- Eli Levy Karin
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, 37077, Göttingen, Germany.
| | - Milot Mirdita
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, 37077, Göttingen, Germany
| | - Johannes Söding
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, 37077, Göttingen, Germany.
| |
Collapse
|
55
|
Blanco-Míguez A, Fdez-Riverola F, Sánchez B, Lourenço A. Resources and tools for the high-throughput, multi-omic study of intestinal microbiota. Brief Bioinform 2020; 20:1032-1056. [PMID: 29186315 DOI: 10.1093/bib/bbx156] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 10/23/2017] [Indexed: 12/18/2022] Open
Abstract
The human gut microbiome impacts several aspects of human health and disease, including digestion, drug metabolism and the propensity to develop various inflammatory, autoimmune and metabolic diseases. Many of the molecular processes that play a role in the activity and dynamics of the microbiota go beyond species and genic composition and thus, their understanding requires advanced bioinformatics support. This article aims to provide an up-to-date view of the resources and software tools that are being developed and used in human gut microbiome research, in particular data integration and systems-level analysis efforts. These efforts demonstrate the power of standardized and reproducible computational workflows for integrating and analysing varied omics data and gaining deeper insights into microbe community structure and function as well as host-microbe interactions.
Collapse
Affiliation(s)
| | | | | | - Anália Lourenço
- Dpto. de Informática - Universidade de Vigo, ESEI - Escuela Superior de Ingeniería Informática, Edificio politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain
| |
Collapse
|
56
|
Amses KR, Davis WJ, James TY. SCGid: a consensus approach to contig filtering and genome prediction from single-cell sequencing libraries of uncultured eukaryotes. Bioinformatics 2020; 36:1994-2000. [PMID: 31764940 PMCID: PMC7141854 DOI: 10.1093/bioinformatics/btz866] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 10/09/2019] [Accepted: 11/22/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Whole-genome sequencing of uncultured eukaryotic genomes is complicated by difficulties in acquiring sufficient amounts of tissue. Single-cell genomics (SCG) by multiple displacement amplification provides a technical workaround, yielding whole-genome libraries which can be assembled de novo. Downsides of multiple displacement amplification include coverage biases and exacerbation of contamination. These factors affect assembly continuity and fidelity, complicating discrimination of genomes from contamination and noise by available tools. Uncultured eukaryotes and their relatives are often underrepresented in large sequence data repositories, further impairing identification and separation. RESULTS We compare the ability of filtering approaches to remove contamination and resolve eukaryotic draft genomes from SCG metagenomes, finding significant variation in outcomes. To address these inconsistencies, we introduce a consensus approach that is codified in the SCGid software package. SCGid parallelly filters assemblies using different approaches, yielding three intermediate drafts from which consensus is drawn. Using genuine and mock SCG metagenomes, we show that our approach corrects for variation among draft genomes predicted by individual approaches and outperforms them in recapitulating published drafts in a fast and repeatable way, providing a useful alternative to available methods and manual curation. AVAILABILITY AND IMPLEMENTATION The SCGid package is implemented in python and R. Source code is available at http://www.github.com/amsesk/SCGid under the GNU GPL 3.0 license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kevin R Amses
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - William J Davis
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Timothy Y James
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
57
|
Aluthge ND, Van Sambeek DM, Carney-Hinkle EE, Li YS, Fernando SC, Burkey TE. BOARD INVITED REVIEW: The pig microbiota and the potential for harnessing the power of the microbiome to improve growth and health1. J Anim Sci 2019; 97:3741-3757. [PMID: 31250899 DOI: 10.1093/jas/skz208] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 06/24/2019] [Indexed: 12/14/2022] Open
Abstract
A variety of microorganisms inhabit the gastrointestinal tract of animals including bacteria, archaea, fungi, protozoa, and viruses. Pioneers in gut microbiology have stressed the critical importance of diet:microbe interactions and how these interactions may contribute to health status. As scientists have overcome the limitations of culture-based microbiology, the importance of these interactions has become more clear even to the extent that the gut microbiota has emerged as an important immunologic and metabolic organ. Recent advances in metagenomics and metabolomics have helped scientists to demonstrate that interactions among the diet, the gut microbiota, and the host to have profound effects on animal health and disease. However, although scientists have now accumulated a great deal of data with respect to what organisms comprise the gastrointestinal landscape, there is a need to look more closely at causative effects of the microbiome. The objective of this review is intended to provide: 1) a review of what is currently known with respect to the dynamics of microbial colonization of the porcine gastrointestinal tract; 2) a review of the impact of nutrient:microbe effects on growth and health; 3) examples of the therapeutic potential of prebiotics, probiotics, and synbiotics; and 4) a discussion about what the future holds with respect to microbiome research opportunities and challenges. Taken together, by considering what is currently known in the four aforementioned areas, our overarching goal is to set the stage for narrowing the path towards discovering how the porcine gut microbiota (individually and collectively) may affect specific host phenotypes.
Collapse
Affiliation(s)
- Nirosh D Aluthge
- Department of Animal Science, University of Nebraska, Lincoln, NE
| | | | | | - Yanshuo S Li
- Department of Animal Science, University of Nebraska, Lincoln, NE
| | | | - Thomas E Burkey
- Department of Animal Science, University of Nebraska, Lincoln, NE
| |
Collapse
|
58
|
Lugli GA, Milani C, Mancabelli L, Turroni F, van Sinderen D, Ventura M. A microbiome reality check: limitations of in silico-based metagenomic approaches to study complex bacterial communities. ENVIRONMENTAL MICROBIOLOGY REPORTS 2019; 11:840-847. [PMID: 31668006 DOI: 10.1111/1758-2229.12805] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 10/28/2019] [Indexed: 06/10/2023]
Abstract
In recent years, whole shotgun metagenomics (WSM) of complex microbial communities has become an established technology to perform compositional analyses of complex microbial communities, an approach that is heavily reliant on bioinformatic pipelines to process and interpret the generated raw sequencing data. However, the use of such in silico pipelines for the microbial taxonomic classification of short sequences may lead to significant errors in the compositional outputs deduced from such sequencing data. To investigate the ability of such in silico pipelines, we employed two commonly applied bioinformatic tools, i.e., MetaPhlAn2 and Kraken2 together with two metagenomic data sets originating from human and animal faecal samples. By using these bioinformatic programs that taxonomically classify WSM data based on marker genes, we observed a trend to depict a lower complexity of the microbial communities. Here, we assess the limitations of the most commonly employed bioinformatic pipelines, i.e., MetaPhlAn2 and Kraken2, and based on our findings, we propose that such analyses should ideally be combined with experimentally based microbiological validations.
Collapse
Affiliation(s)
- Gabriele Andrea Lugli
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Christian Milani
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Leonardo Mancabelli
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
| | - Francesca Turroni
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
- Microbiome Research Hub, University of Parma, Parma, Italy
| | - Douwe van Sinderen
- APC Microbiome Institute and School of Microbiology, Bioscience Institute, National University of Ireland, Cork, Ireland
| | - Marco Ventura
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parma, Italy
- Microbiome Research Hub, University of Parma, Parma, Italy
| |
Collapse
|
59
|
Vilne B, Meistere I, Grantiņa-Ieviņa L, Ķibilds J. Machine Learning Approaches for Epidemiological Investigations of Food-Borne Disease Outbreaks. Front Microbiol 2019; 10:1722. [PMID: 31447800 PMCID: PMC6691741 DOI: 10.3389/fmicb.2019.01722] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/12/2019] [Indexed: 12/14/2022] Open
Abstract
Foodborne diseases (FBDs) are infections of the gastrointestinal tract caused by foodborne pathogens (FBPs) such as bacteria [Salmonella, Listeria monocytogenes and Shiga toxin-producing E. coli (STEC)] and several viruses, but also parasites and some fungi. Artificial intelligence (AI) and its sub-discipline machine learning (ML) are re-emerging and gaining an ever increasing popularity in the scientific community and industry, and could lead to actionable knowledge in diverse ranges of sectors including epidemiological investigations of FBD outbreaks and antimicrobial resistance (AMR). As genotyping using whole-genome sequencing (WGS) is becoming more accessible and affordable, it is increasingly used as a routine tool for the detection of pathogens, and has the potential to differentiate between outbreak strains that are closely related, identify virulence/resistance genes and provide improved understanding of transmission events within hours to days. In most cases, the computational pipeline of WGS data analysis can be divided into four (though, not necessarily consecutive) major steps: de novo genome assembly, genome characterization, comparative genomics, and inference of phylogeny or phylogenomics. In each step, ML could be used to increase the speed and potentially the accuracy (provided increasing amounts of high-quality input data) of identification of the source of ongoing outbreaks, leading to more efficient treatment and prevention of additional cases. In this review, we explore whether ML or any other form of AI algorithms have already been proposed for the respective tasks and compare those with mechanistic model-based approaches.
Collapse
Affiliation(s)
- Baiba Vilne
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
- SIA net-OMICS, Riga, Latvia
| | - Irēna Meistere
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| | | | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| |
Collapse
|
60
|
Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat Biotechnol 2019; 37:937-944. [PMID: 31359005 DOI: 10.1038/s41587-019-0191-2] [Citation(s) in RCA: 199] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 06/11/2019] [Indexed: 12/15/2022]
Abstract
Characterization of microbiomes has been enabled by high-throughput metagenomic sequencing. However, existing methods are not designed to combine reads from short- and long-read technologies. We present a hybrid metagenomic assembler named OPERA-MS that integrates assembly-based metagenome clustering with repeat-aware, exact scaffolding to accurately assemble complex communities. Evaluation using defined in vitro and virtual gut microbiomes revealed that OPERA-MS assembles metagenomes with greater base pair accuracy than long-read (>5×; Canu), higher contiguity than short-read (~10× NGA50; MEGAHIT, IDBA-UD, metaSPAdes) and fewer assembly errors than non-metagenomic hybrid assemblers (2×; hybridSPAdes). OPERA-MS provides strain-resolved assembly in the presence of multiple genomes of the same species, high-quality reference genomes for rare species (<1%) with ~9× long-read coverage and near-complete genomes with higher coverage. We used OPERA-MS to assemble 28 gut metagenomes of antibiotic-treated patients, and showed that the inclusion of long nanopore reads produces more contiguous assemblies (200× improvement over short-read assemblies), including more than 80 closed plasmid or phage sequences and a new 263 kbp jumbo phage. High-quality hybrid assemblies enable an exquisitely detailed view of the gut resistome in human patients.
Collapse
|
61
|
Breitwieser FP, Lu J, Salzberg SL. A review of methods and databases for metagenomic classification and assembly. Brief Bioinform 2019; 20:1125-1136. [PMID: 29028872 PMCID: PMC6781581 DOI: 10.1093/bib/bbx120] [Citation(s) in RCA: 297] [Impact Index Per Article: 49.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 08/22/2017] [Indexed: 12/13/2022] Open
Abstract
Microbiome research has grown rapidly over the past decade, with a proliferation of new methods that seek to make sense of large, complex data sets. Here, we survey two of the primary types of methods for analyzing microbiome data: read classification and metagenomic assembly, and we review some of the challenges facing these methods. All of the methods rely on public genome databases, and we also discuss the content of these databases and how their quality has a direct impact on our ability to interpret a microbiome sample.
Collapse
Affiliation(s)
| | | | - Steven L Salzberg
- Corresponding author: Steven L. Salzberg, Center for Computational Biology, Johns Hopkins University, 1900 E. Monument St., Baltimore, MD, 21205, USA. E-mail:
| |
Collapse
|
62
|
Abstract
Microbiome research has grown rapidly over the past decade, with a proliferation of new methods that seek to make sense of large, complex data sets. Here, we survey two of the primary types of methods for analyzing microbiome data: read classification and metagenomic assembly, and we review some of the challenges facing these methods. All of the methods rely on public genome databases, and we also discuss the content of these databases and how their quality has a direct impact on our ability to interpret a microbiome sample.
Collapse
|
63
|
Uncovering bacterial and functional diversity in macroinvertebrate mitochondrial-metagenomic datasets by differential centrifugation. Sci Rep 2019; 9:10257. [PMID: 31312027 PMCID: PMC6635389 DOI: 10.1038/s41598-019-46717-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Accepted: 07/04/2019] [Indexed: 12/20/2022] Open
Abstract
PCR-free techniques such as meta-mitogenomics (MMG) can recover taxonomic composition of macroinvertebrate communities, but suffer from low efficiency, as >90% of sequencing data is mostly uninformative due to the great abundance of nuclear DNA that cannot be identified with current reference databases. Current MMG studies do not routinely check data for information on macroinvertebrate-associated bacteria and gene functions. However, this could greatly increase the efficiency of MMG studies by revealing yet overlooked diversity within ecosystems and making currently unused data available for ecological studies. By analysing six ‘mock’ communities, each containing three macroinvertebrate taxa, we tested whether this additional data on bacterial taxa and functional potential of communities can be extracted from MMG datasets. Further, we tested whether differential centrifugation, which is known to greatly increase efficiency of macroinvertebrate MMG studies by enriching for mitochondria, impacts on the inferred bacterial community composition. Our results show that macroinvertebrate MMG datasets contain a high number of mostly endosymbiont bacterial taxa and associated gene functions. Centrifugation reduced both the absolute and relative abundance of highly abundant Gammaproteobacteria, thereby facilitating detection of rare taxa and functions. When analysing both taxa and gene functions, the number of features obtained from the MMG dataset increased 31-fold (‘enriched’) respectively 234-fold (‘not enriched’). We conclude that analysing MMG datasets for bacteria and gene functions greatly increases the amount of information available and facilitates the use of shotgun metagenomic techniques for future studies on biodiversity.
Collapse
|
64
|
Seiler E, Trappe K, Renard BY. Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput Biol 2019; 15:e1007208. [PMID: 31335917 PMCID: PMC6677323 DOI: 10.1371/journal.pcbi.1007208] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 08/02/2019] [Accepted: 06/24/2019] [Indexed: 12/22/2022] Open
Abstract
Horizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. The tool Daisy offers the first HGT detection approach based on read mapping that provides complementary evidence compared to existing methods. However, Daisy relies on the acceptor and donor organism involved in the HGT being known. We introduce DaisyGPS, a mapping-based pipeline that is able to identify acceptor and donor reference candidates of an HGT event based on sequencing reads. Acceptor and donor identification is akin to species identification in metagenomic samples based on sequencing reads, a problem addressed by metagenomic profiling tools. However, acceptor and donor references have certain properties such that these methods cannot be directly applied. DaisyGPS uses MicrobeGPS, a metagenomic profiling tool tailored towards estimating the genomic distance between organisms in the sample and the reference database. We enhance the underlying scoring system of MicrobeGPS to account for the sequence patterns in terms of mapping coverage of an acceptor and donor involved in an HGT event, and report a ranked list of reference candidates. These candidates can then be further evaluated by tools like Daisy to establish HGT regions. We successfully validated our approach on both simulated and real data, and show its benefits in an investigation of an outbreak involving Methicillin-resistant Staphylococcus aureus data.
Collapse
Affiliation(s)
- Enrico Seiler
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, and Algorithmic Bioinformatics, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
65
|
Coil DA, Jospin G, Darling AE, Wallis C, Davis IJ, Harris S, Eisen JA, Holcombe LJ, O’Flynn C. Genomes from bacteria associated with the canine oral cavity: A test case for automated genome-based taxonomic assignment. PLoS One 2019; 14:e0214354. [PMID: 31181071 PMCID: PMC6557473 DOI: 10.1371/journal.pone.0214354] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 05/27/2019] [Indexed: 11/18/2022] Open
Abstract
Taxonomy for bacterial isolates is commonly assigned via sequence analysis. However, the most common sequence-based approaches (e.g. 16S rRNA gene-based phylogeny or whole genome comparisons) are still labor intensive and subjective to varying degrees. Here we present a set of 33 bacterial genomes, isolated from the canine oral cavity. Taxonomy of these isolates was first assigned by PCR amplification of the 16S rRNA gene, Sanger sequencing, and taxonomy assignment using BLAST. After genome sequencing, taxonomy was revisited through a manual process using a combination of average nucleotide identity (ANI), concatenated marker gene phylogenies, and 16S rRNA gene phylogenies. This taxonomy was then compared to the automated taxonomic assignment given by the recently proposed Genome Taxonomy Database (GTDB). We found the results of all three methods to be similar (25 out of the 33 had matching genera), but the GTDB approach required fewer subjective decisions, and required far less labor. The primary differences in the non-identical taxonomic assignments involved cases where GTDB has proposed taxonomic revisions.
Collapse
Affiliation(s)
- David A. Coil
- Genome Center, University of California, Davis, CA, United States of America
| | - Guillaume Jospin
- Genome Center, University of California, Davis, CA, United States of America
| | - Aaron E. Darling
- The Ithree Institute, University of Technology Sydney, Ultimo NSW, Australia
| | - Corrin Wallis
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
| | - Ian J. Davis
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
| | - Stephen Harris
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
| | - Jonathan A. Eisen
- Genome Center, University of California, Davis, CA, United States of America
- Evolution and Ecology, Medical Microbiology and Immunology, University of California, Davis, Davis, CA, United States of America
| | - Lucy J. Holcombe
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
| | - Ciaran O’Flynn
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
- * E-mail:
| |
Collapse
|
66
|
Yoon G, Gaynanova I, Müller CL. Microbial Networks in SPRING - Semi-parametric Rank-Based Correlation and Partial Correlation Estimation for Quantitative Microbiome Data. Front Genet 2019; 10:516. [PMID: 31244881 PMCID: PMC6563871 DOI: 10.3389/fgene.2019.00516] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 05/13/2019] [Indexed: 12/15/2022] Open
Abstract
High-throughput microbial sequencing techniques, such as targeted amplicon-based and metagenomic profiling, provide low-cost genomic survey data of microbial communities in their natural environment, ranging from marine ecosystems to host-associated habitats. While standard microbiome profiling data can provide sparse relative abundances of operational taxonomic units or genes, recent advances in experimental protocols give a more quantitative picture of microbial communities by pairing sequencing-based techniques with orthogonal measurements of microbial cell counts from the same sample. These tandem measurements provide absolute microbial count data albeit with a large excess of zeros due to limited sequencing depth. In this contribution we consider the fundamental statistical problem of estimating correlations and partial correlations from such quantitative microbiome data. To this end, we propose a semi-parametric rank-based approach to correlation estimation that can naturally deal with the excess zeros in the data. Combining this estimator with sparse graphical modeling techniques leads to the Semi-Parametric Rank-based approach for INference in Graphical model (SPRING). SPRING enables inference of statistical microbial association networks from quantitative microbiome data which can serve as high-level statistical summary of the underlying microbial ecosystem and can provide testable hypotheses for functional species-species interactions. Due to the absence of verified microbial associations we also introduce a novel quantitative microbiome data generation mechanism which mimics empirical marginal distributions of measured count data while simultaneously allowing user-specified dependencies among the variables. SPRING shows superior network recovery performance on a wide range of realistic benchmark problems with varying network topologies and is robust to misspecifications of the total cell count estimate. To highlight SPRING's broad applicability we infer taxon-taxon associations from the American Gut Project data and genus-genus associations from a recent quantitative gut microbiome dataset. We believe that, as quantitative microbiome profiling data will become increasingly available, the semi-parametric estimators for correlation and partial correlation estimation introduced here provide an important tool for reliable statistical analysis of quantitative microbiome data.
Collapse
Affiliation(s)
- Grace Yoon
- Department of Statistics, Texas A&M University, College Station, TX, United States
| | - Irina Gaynanova
- Department of Statistics, Texas A&M University, College Station, TX, United States
| | - Christian L. Müller
- Center for Computational Mathematics, Flatiron Institute, New York, NY, United States
| |
Collapse
|
67
|
Miller IJ, Rees ER, Ross J, Miller I, Baxa J, Lopera J, Kerby RL, Rey FE, Kwan JC. Autometa: automated extraction of microbial genomes from individual shotgun metagenomes. Nucleic Acids Res 2019; 47:e57. [PMID: 30838416 PMCID: PMC6547426 DOI: 10.1093/nar/gkz148] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 02/15/2019] [Accepted: 02/21/2019] [Indexed: 12/28/2022] Open
Abstract
Shotgun metagenomics is a powerful, high-resolution technique enabling the study of microbial communities in situ. However, species-level resolution is only achieved after a process of 'binning' where contigs predicted to originate from the same genome are clustered. Such culture-independent sequencing frequently unearths novel microbes, and so various methods have been devised for reference-free binning. As novel microbiomes of increasing complexity are explored, sometimes associated with non-model hosts, robust automated binning methods are required. Existing methods struggle with eukaryotic contamination and cannot handle highly complex single metagenomes. We therefore developed an automated binning pipeline, termed 'Autometa', to address these issues. This command-line application integrates sequence homology, nucleotide composition, coverage and the presence of single-copy marker genes to separate microbial genomes from non-model host genomes and other eukaryotic contaminants, before deconvoluting individual genomes from single metagenomes. The method is able to effectively separate over 1000 genomes from a metagenome, allowing the study of previously intractably complex environments at the level of single species. Autometa is freely available at https://bitbucket.org/jason_c_kwan/autometa and as a docker image at https://hub.docker.com/r/jasonkwan/autometa under the GNU Affero General Public License 3 (AGPL 3).
Collapse
Affiliation(s)
- Ian J Miller
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Jennifer Ross
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Izaak Miller
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Jared Baxa
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Juan Lopera
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Robert L Kerby
- Department of Bacteriology, University of Wisconsin–Madison, 1550 Linden Drive, Madison, WI 53706, USA
| | - Federico E Rey
- Department of Bacteriology, University of Wisconsin–Madison, 1550 Linden Drive, Madison, WI 53706, USA
| | - Jason C Kwan
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin–Madison, 777 Highland Avenue, Madison, WI 53705, USA
| |
Collapse
|
68
|
Chang Y, Desirò A, Na H, Sandor L, Lipzen A, Clum A, Barry K, Grigoriev IV, Martin FM, Stajich JE, Smith ME, Bonito G, Spatafora JW. Phylogenomics of Endogonaceae and evolution of mycorrhizas within Mucoromycota. THE NEW PHYTOLOGIST 2019; 222:511-525. [PMID: 30485448 DOI: 10.1111/nph.15613] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2018] [Accepted: 10/29/2018] [Indexed: 06/09/2023]
Abstract
Endogonales (Mucoromycotina), composed of Endogonaceae and Densosporaceae, is the only known non-Dikarya order with ectomycorrhizal members. They also form mycorrhizal-like association with some nonspermatophyte plants. It has been recently proposed that Endogonales were among the earliest mycorrhizal partners with land plants. It remains unknown whether Endogonales possess genomes with mycorrhizal-lifestyle signatures and whether Endogonales originated around the same time as land plants did. We sampled sporocarp tissue from four Endogonaceae collections and performed shotgun genome sequencing. After binning the metagenome data, we assembled and annotated the Endogonaceae genomes. We performed comparative analysis on plant-cell-wall-degrading enzymes (PCWDEs) and small secreted proteins (SSPs). We inferred phylogenetic placement of Endogonaceae and estimated the ages of Endogonaceae and Endogonales with expanded taxon sampling. Endogonaceae have large genomes with high repeat content, low diversity of PCWDEs, but without elevated SSP/secretome ratios. Dating analysis estimated that Endogonaceae originated in the Permian-Triassic boundary and Endogonales originated in the mid-late Silurian. Mycoplasma-related endobacterium sequences were identified in three Endogonaceae genomes. Endogonaceae genomes possess typical signatures of mycorrhizal lifestyle. The early origin of Endogonales suggests that the mycorrhizal association between Endogonales and plants might have played an important role during the colonization of land by plants.
Collapse
Affiliation(s)
- Ying Chang
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA
| | - Alessandro Desirò
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | - Hyunsoo Na
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - Laura Sandor
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - Anna Lipzen
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - Alicia Clum
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - Kerrie Barry
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - Igor V Grigoriev
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Francis M Martin
- Institut national de la recherche agronomique, Laboratoire d'excellence ARBRE, Centre INRA-Grand Est, Unité mixte de recherche Inra-Université de Lorraine "Interactions Arbres/Microorganismes", 54280, Champenoux, France
| | - Jason E Stajich
- Department of Microbiology and Plant Pathology, University of California, Riverside, CA, 92521, USA
| | - Matthew E Smith
- Department of Plant Pathology, University of Florida, Gainesville, FL, 32611, USA
| | - Gregory Bonito
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | - Joseph W Spatafora
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA
| |
Collapse
|
69
|
Abstract
Stable isotope probing (SIP) provides researchers a culture-independent method to retrieve nucleic acids from active microbial populations performing a specific metabolic activity in complex ecosystems. In recent years, the use of the SIP method in microbial ecology studies has been accelerated. This is partly due to the advances in sequencing and bioinformatics tools, which enable fast and reliable analysis of DNA and RNA from the SIP experiments. One of these sequencing tools, metagenomics, has contributed significantly to the body of knowledge by providing data not only on taxonomy but also on the key functional genes in specific metabolic pathways and their relative abundances. In this chapter, we provide a general background on the application of the SIP-metagenomics approach in microbial ecology and a workflow for the analysis of metagenomic datasets using the most up-to-date bioinformatics tools.
Collapse
Affiliation(s)
- Eileen Kröber
- Microbial Biogeochemistry, RA Landscape Functioning, ZALF Leibniz Centre for Landscape Research, Müncheberg, Germany
| | - Özge Eyice
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK.
| |
Collapse
|
70
|
Exploring Foodborne Pathogen Ecology and Antimicrobial Resistance in the Light of Shotgun Metagenomics. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2018; 1918:229-245. [PMID: 30580413 DOI: 10.1007/978-1-4939-9000-9_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
In this chapter, applications of shotgun metagenomics for taxonomic profiling and functional investigation of food microbial communities with a focus on antimicrobial resistance (AMR) were overviewed in the light of last data in the field. Potentialities of metagenomic approach, along with the challenges encountered for a wider and routinely use in food safety was discussed.
Collapse
|
71
|
Zahariev M, Chen W, Visagie CM, Lévesque CA. Cluster oligonucleotide signatures for rapid identification by sequencing. BMC Bioinformatics 2018; 19:395. [PMID: 30522439 PMCID: PMC6284311 DOI: 10.1186/s12859-018-2363-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2017] [Accepted: 09/09/2018] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Oligonucleotide signatures (signatures) have been widely used for studying microbial diversity and function in wet-lab settings, but using them for accurate in silico identification of organisms from high-throughput sequencing (HTS) data is only a proof of concept. Existing signature design programs for sequence signatures (signatures matching exactly one sequence) or clade signatures (signatures matching every sequence in a phylogenetic clade) are not able to identify all possible polymorphic sites for sequences with high similarity and perform poorly when handling large genome sequencing datasets. RESULTS We introduce cluster signatures: subsequences that match perfectly and exclusively any group of sequences in a data set. Cluster signatures provide complete recall for primer/probe design and increased discrimination between sequences beyond that of clade signatures. Using cluster signatures for in silico identification of HTS targets achieves good precision/recall and running time performance. This method has been implemented into an open source tool, the Automated Oligonucleotide Design Pipeline (adop), included in supplementary material and available at: https://bitbucket.org/wenchen_aafc/aodp_v2.0_release . CONCLUSIONS Cluster signatures provide a rapid and universal analysis tool to identify all possible short diagnostic DNA markers and variants from any DNA sequencing dataset. They are particularly useful in discriminating genetic material from closely related organisms and in detecting deleterious mutations in highly or perfectly conserved genomic sites.
Collapse
Affiliation(s)
- Manuel Zahariev
- Ottawa R&D Centre, Agriculture & Agri-Food Canada, 960 Carling Ave., Ottawa, ON, K1A 0C6 Canada
| | - Wen Chen
- Skwez Technology Corp, Box 3674, Garibaldi Highlands, BC, V0N 1T0 Canada
| | - Cobus M. Visagie
- The Agricultural Research Counci –PPRI, P/Bag X134, Queenswood, 0121 South Africa
| | - C. André Lévesque
- Sidney Laboratory Project - Science, Canadian Food Inspection Agency, Floor 2E, Room 233, 59 Camelot Drive, Ottawa, ON, K1A 0Y9 Canada
| |
Collapse
|
72
|
Bernard G, Pathmanathan JS, Lannes R, Lopez P, Bapteste E. Microbial Dark Matter Investigations: How Microbial Studies Transform Biological Knowledge and Empirically Sketch a Logic of Scientific Discovery. Genome Biol Evol 2018; 10:707-715. [PMID: 29420719 PMCID: PMC5830969 DOI: 10.1093/gbe/evy031] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/05/2018] [Indexed: 02/07/2023] Open
Abstract
Microbes are the oldest and most widespread, phylogenetically and metabolically diverse life forms on Earth. However, they have been discovered only 334 years ago, and their diversity started to become seriously investigated even later. For these reasons, microbial studies that unveil novel microbial lineages and processes affecting or involving microbes deeply (and repeatedly) transform knowledge in biology. Considering the quantitative prevalence of taxonomically and functionally unassigned sequences in environmental genomics data sets, and that of uncultured microbes on the planet, we propose that unraveling the microbial dark matter should be identified as a central priority for biologists. Based on former empirical findings of microbial studies, we sketch a logic of discovery with the potential to further highlight the microbial unknowns.
Collapse
Affiliation(s)
- Guillaume Bernard
- Sorbonne Universités, UPMC Université Paris 06, Institut de Biologie Paris-Seine (IBPS), France
| | - Jananan S Pathmanathan
- Sorbonne Universités, UPMC Université Paris 06, Institut de Biologie Paris-Seine (IBPS), France
| | - Romain Lannes
- Sorbonne Universités, UPMC Université Paris 06, Institut de Biologie Paris-Seine (IBPS), France
| | - Philippe Lopez
- Sorbonne Universités, UPMC Université Paris 06, Institut de Biologie Paris-Seine (IBPS), France
| | - Eric Bapteste
- Sorbonne Universités, UPMC Université Paris 06, Institut de Biologie Paris-Seine (IBPS), France
| |
Collapse
|
73
|
Poussin C, Sierro N, Boué S, Battey J, Scotti E, Belcastro V, Peitsch MC, Ivanov NV, Hoeng J. Interrogating the microbiome: experimental and computational considerations in support of study reproducibility. Drug Discov Today 2018; 23:1644-1657. [PMID: 29890228 DOI: 10.1016/j.drudis.2018.06.005] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 05/03/2018] [Accepted: 06/06/2018] [Indexed: 12/12/2022]
Abstract
The microbiome is an important factor in human health and disease and is investigated to develop novel therapeutics. Metagenomics leverages advances in sequencing technologies and computational analysis to identify and quantify the microorganisms present in a sample. This field has, however, not yet reached maturity and the international metagenomics community, aware of the current limitations and of the necessity for standardization, has started investigating sources of variability in experimental and computational workflows. The first studies have already resulted in the identification of crucial steps and factors affecting metagenomics data quality, quantification and interpretation. This review summarizes experimental and computational considerations for interrogating the microbiome and establishing reproducible and robust analysis workflows.
Collapse
Affiliation(s)
- Carine Poussin
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland
| | - Nicolas Sierro
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland
| | - Stéphanie Boué
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland
| | - James Battey
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland
| | - Elena Scotti
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland
| | - Vincenzo Belcastro
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland
| | - Manuel C Peitsch
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland
| | - Nikolai V Ivanov
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland
| | - Julia Hoeng
- PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, CH-2000 Neuchâtel, Switzerland.
| |
Collapse
|
74
|
Muller EE, Faust K, Widder S, Herold M, Martínez Arbas S, Wilmes P. Using metabolic networks to resolve ecological properties of microbiomes. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.coisb.2017.12.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
75
|
Surveillance of Foodborne Pathogens: Towards Diagnostic Metagenomics of Fecal Samples. Genes (Basel) 2018; 9:genes9010014. [PMID: 29300319 PMCID: PMC5793167 DOI: 10.3390/genes9010014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 12/05/2017] [Accepted: 12/19/2017] [Indexed: 01/08/2023] Open
Abstract
Diagnostic metagenomics is a rapidly evolving laboratory tool for culture-independent tracing of foodborne pathogens. The method has the potential to become a generic platform for detection of most pathogens and many sample types. Today, however, it is still at an early and experimental stage. Studies show that metagenomic methods, from sample storage and DNA extraction to library preparation and shotgun sequencing, have a great influence on data output. To construct protocols that extract the complete metagenome but with minimal bias is an ongoing challenge. Many different software strategies for data analysis are being developed, and several studies applying diagnostic metagenomics to human clinical samples have been published, detecting, and sometimes, typing bacterial infections. It is possible to obtain a draft genome of the pathogen and to develop methods that can theoretically be applied in real-time. Finally, diagnostic metagenomics can theoretically be better geared than conventional methods to detect co-infections. The present review focuses on the current state of test development, as well as practical implementation of diagnostic metagenomics to trace foodborne bacterial infections in fecal samples from animals and humans.
Collapse
|
76
|
Herath D, Tang SL, Tandon K, Ackland D, Halgamuge SK. CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision. BMC Bioinformatics 2017; 18:571. [PMID: 29297295 PMCID: PMC5751405 DOI: 10.1186/s12859-017-1967-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Results Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. Conclusions The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1967-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Damayanthi Herath
- Department of Mechanical Engineering, The University of Melbourne, Parkville, Melbourne, 3010, Australia. .,Department of Computer Engineering, University of Peradeniya, Prof. E. O. E. Pereira Mawatha, Peradeniya, 20400, Sri Lanka.
| | - Sen-Lin Tang
- Biodiversity Research Center, Academia Sinica, Nan-Kang, Taipei, 11529, Taiwan
| | - Kshitij Tandon
- Biodiversity Research Center, Academia Sinica, Nan-Kang, Taipei, 11529, Taiwan.,Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, 300, Taiwan.,Bioinformatics Program, Institute of Information Science, Taiwan International Graduate Program, Academia Sinica, Taipei, 115, Taiwan
| | - David Ackland
- Department of Biomedical Engineering, The University of Melbourne, Victoria, 3010, Australia
| | - Saman Kumara Halgamuge
- Research School of Engineering, College of Engineering and Computer Science, The Australian National University, Canberra ACT, 2601, Australia
| |
Collapse
|
77
|
Wang Y, Wang K, Lu YY, Sun F. Improving contig binning of metagenomic data using [Formula: see text] oligonucleotide frequency dissimilarity. BMC Bioinformatics 2017; 18:425. [PMID: 28931373 PMCID: PMC5607646 DOI: 10.1186/s12859-017-1835-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 09/11/2017] [Indexed: 04/27/2023] Open
Abstract
BACKGROUND Metagenomics sequencing provides deep insights into microbial communities. To investigate their taxonomic structure, binning assembled contigs into discrete clusters is critical. Many binning algorithms have been developed, but their performance is not always satisfactory, especially for complex microbial communities, calling for further development. RESULTS According to previous studies, relative sequence compositions are similar across different regions of the same genome, but they differ between distinct genomes. Generally, current tools have used the normalized frequency of k-tuples directly, but this represents an absolute, not relative, sequence composition. Therefore, we attempted to model contigs using relative k-tuple composition, followed by measuring dissimilarity between contigs using [Formula: see text]. The [Formula: see text] was designed to measure the dissimilarity between two long sequences or Next-Generation Sequencing data with the Markov models of the background genomes. This method was effective in revealing group and gradient relationships between genomes, metagenomes and metatranscriptomes. With many binning tools available, we do not try to bin contigs from scratch. Instead, we developed [Formula: see text] to adjust contigs among bins based on the output of existing binning tools for a single metagenomic sample. The tool is taxonomy-free and depends only on k-tuples. To evaluate the performance of [Formula: see text], five widely used binning tools with different strategies of sequence composition or the hybrid of sequence composition and abundance were selected to bin six synthetic and real datasets, after which [Formula: see text] was applied to adjust the binning results. Our experiments showed that [Formula: see text] consistently achieves the best performance with tuple length k = 6 under the independent identically distributed (i.i.d.) background model. Using the metrics of recall, precision and ARI (Adjusted Rand Index), [Formula: see text] improves the binning performance in 28 out of 30 testing experiments (6 datasets with 5 binning tools). The [Formula: see text] is available at https://github.com/kunWangkun/d2SBin . CONCLUSIONS Experiments showed that [Formula: see text] accurately measures the dissimilarity between contigs of metagenomic reads and that relative sequence composition is more reasonable to bin the contigs. The [Formula: see text] can be applied to any existing contig-binning tools for single metagenomic samples to obtain better binning results.
Collapse
Affiliation(s)
- Ying Wang
- Department of Automation, Xiamen University, Xiamen, Fujian 361005 China
| | - Kun Wang
- Department of Automation, Xiamen University, Xiamen, Fujian 361005 China
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, CA 90089 USA
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, CA 90089 USA
- Center for Computational Systems Biology, Fudan University, Shanghai, 200433 China
| |
Collapse
|
78
|
Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations. Mar Drugs 2017; 15:md15060165. [PMID: 28587290 PMCID: PMC5484115 DOI: 10.3390/md15060165] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 05/22/2017] [Accepted: 05/31/2017] [Indexed: 02/06/2023] Open
Abstract
Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC.
Collapse
|
79
|
Alvarenga DO, Fiore MF, Varani AM. A Metagenomic Approach to Cyanobacterial Genomics. Front Microbiol 2017; 8:809. [PMID: 28536564 PMCID: PMC5422444 DOI: 10.3389/fmicb.2017.00809] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 04/20/2017] [Indexed: 01/08/2023] Open
Abstract
Cyanobacteria, or oxyphotobacteria, are primary producers that establish ecological interactions with a wide variety of organisms. Although their associations with eukaryotes have received most attention, interactions with bacterial and archaeal symbionts have also been occurring for billions of years. Due to these associations, obtaining axenic cultures of cyanobacteria is usually difficult, and most isolation efforts result in unicyanobacterial cultures containing a number of associated microbes, hence composing a microbial consortium. With rising numbers of cyanobacterial blooms due to climate change, demand for genomic evaluations of these microorganisms is increasing. However, standard genomic techniques call for the sequencing of axenic cultures, an approach that not only adds months or even years for culture purification, but also appears to be impossible for some cyanobacteria, which is reflected in the relatively low number of publicly available genomic sequences of this phylum. Under the framework of metagenomics, on the other hand, cumbersome techniques for achieving axenic growth can be circumvented and individual genomes can be successfully obtained from microbial consortia. This review focuses on approaches for the genomic and metagenomic assessment of non-axenic cyanobacterial cultures that bypass requirements for axenity. These methods enable researchers to achieve faster and less costly genomic characterizations of cyanobacterial strains and raise additional information about their associated microorganisms. While non-axenic cultures may have been previously frowned upon in cyanobacteriology, latest advancements in metagenomics have provided new possibilities for in vitro studies of oxyphotobacteria, renewing the value of microbial consortia as a reliable and functional resource for the rapid assessment of bloom-forming cyanobacteria.
Collapse
Affiliation(s)
- Danillo O. Alvarenga
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista (UNESP)Jaboticabal, Brazil
- Centro de Energia Nuclear na Agricultura, Universidade de São Paulo (USP)Piracicaba, Brazil
| | - Marli F. Fiore
- Centro de Energia Nuclear na Agricultura, Universidade de São Paulo (USP)Piracicaba, Brazil
| | - Alessandro M. Varani
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista (UNESP)Jaboticabal, Brazil
| |
Collapse
|