1
|
Koslicki D, White S, Ma C, Novikov A. YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample. Bioinformatics 2024; 40:btae047. [PMID: 38268451 PMCID: PMC10868342 DOI: 10.1093/bioinformatics/btae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 01/05/2024] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. Existing tools generally return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low-abundance organisms as these often reside in the "noisy tail" of incorrect predictions. Furthermore, few tools account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome. RESULTS We present solutions for these issues by introducing the algorithm YACHT: Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of ANI, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power and how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach. AVAILABILITY AND IMPLEMENTATION The source code implementing this approach is available via Conda and at https://github.com/KoslickiLab/YACHT. We also provide the code for reproducing experiments at https://github.com/KoslickiLab/YACHT-reproducibles.
Collapse
Affiliation(s)
- David Koslicki
- Department of Computer Science and Engineering, Pennsylvania State University, State College, PA 16802, United States
- Department of Biology, Pennsylvania State University, State College, PA 16802, United States
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA
- One Health Microbiome Center, Pennsylvania State University, State College, PA 16802, United States
| | - Stephen White
- Department of Mathematics, Pennsylvania State University, State College, PA 16802, United States
| | - Chunyu Ma
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA
| | - Alexei Novikov
- Department of Mathematics, Pennsylvania State University, State College, PA 16802, United States
| |
Collapse
|
2
|
Chanda D, De D. Meta-analysis reveals obesity associated gut microbial alteration patterns and reproducible contributors of functional shift. Gut Microbes 2024; 16:2304900. [PMID: 38265338 PMCID: PMC10810176 DOI: 10.1080/19490976.2024.2304900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024] Open
Abstract
The majority of cohort-specific studies associating gut microbiota with obesity are often contradictory; thus, the replicability of the signature remains questionable. Moreover, the species that drive obesity-associated functional shifts and their replicability remain unexplored. Thus, we aimed to address these questions by analyzing gut microbial metagenome sequencing data to develop an in-depth understanding of obese host-gut microbiota interactions using 3329 samples (Obese, n = 1494; Control, n = 1835) from 17 different countries, including both 16S rRNA gene and metagenomic sequence data. Fecal metagenomic data from diverse geographical locations were curated, profiled, and pooled using a machine learning-based approach to identify robust global signatures of obesity. Furthermore, gut microbial species and pathways were systematically integrated through the genomic content of the species to identify contributors to obesity-associated functional shifts. The community structure of the obese gut microbiome was evaluated, and a reproducible depletion of diversity was observed in the obese compared to the lean gut. From this, we infer that the loss of diversity in the obese gut is responsible for perturbations in the healthy microbial functional repertoire. We identified 25 highly predictive species and 37 pathway associations as signatures of obesity, which were validated with remarkably high accuracy (AUC, Species: 0.85, and pathway: 0.80) with an independent validation dataset. We observed a reduction in short-chain fatty acid (SCFA) producers (several Alistipes species, Odoribacter splanchnicus, etc.) and depletion of promoters of gut barrier integrity (Akkermansia muciniphila and Bifidobacterium longum) in obese guts. Our analysis underlines SCFAs and purine/pyrimidine biosynthesis, carbohydrate metabolism pathways in control individuals, and amino acid, enzyme cofactor, and peptidoglycan biosynthesis pathway enrichment in obese individuals. We also mapped the contributors to important obesity-associated functional shifts and observed that these are both dataset-specific and shared across the datasets. In summary, a comprehensive analysis of diverse datasets unveils species specifically contributing to functional shifts and consistent gut microbial patterns associated to obesity.
Collapse
Affiliation(s)
- Deep Chanda
- Laboratory of Cellular Differentiation & Metabolic Disorder, Department of Biotechnology, National Institute of Technology, Durgapur, India
| | - Debojyoti De
- Laboratory of Cellular Differentiation & Metabolic Disorder, Department of Biotechnology, National Institute of Technology, Durgapur, India
| |
Collapse
|
3
|
Koslicki D, White S, Ma C, Novikov A. YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.18.537298. [PMID: 37131762 PMCID: PMC10153212 DOI: 10.1101/2023.04.18.537298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. While tools exist to answer this question, all existing approaches to date return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low abundance organisms as these often reside in the "noisy tail" of incorrect predictions. Furthermore, no tools to date account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome. In this work, we present solutions for these issues by introducing the algorithm YACHT: Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of average nucleotide identity, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power as well as quantify theoretically how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach. Code implementing this approach, as well as all experiments performed, is available at https://github.com/KoslickiLab/YACHT.
Collapse
Affiliation(s)
- David Koslicki
- Department of Computer Science and Engineering, The Pennsylvania State University
- Department of Biology, The Pennsylvania State University
- Huck Institutes of the Life Sciences, The Pennsylvania State University
- The Microbiome Center, The Pennsylvania State University
| | - Stephen White
- Department of Mathematics, The Pennsylvania State University
| | - Chunyu Ma
- Huck Institutes of the Life Sciences, The Pennsylvania State University
| | - Alexei Novikov
- Department of Mathematics, The Pennsylvania State University
| |
Collapse
|
4
|
Meslier V, Quinquis B, Da Silva K, Plaza Oñate F, Pons N, Roume H, Podar M, Almeida M. Benchmarking second and third-generation sequencing platforms for microbial metagenomics. Sci Data 2022; 9:694. [PMID: 36369227 PMCID: PMC9652401 DOI: 10.1038/s41597-022-01762-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 10/04/2022] [Indexed: 11/13/2022] Open
Abstract
Shotgun metagenomic sequencing is a common approach for studying the taxonomic diversity and metabolic potential of complex microbial communities. Current methods primarily use second generation short read sequencing, yet advances in third generation long read technologies provide opportunities to overcome some of the limitations of short read sequencing. Here, we compared seven platforms, encompassing second generation sequencers (Illumina HiSeq 300, MGI DNBSEQ-G400 and DNBSEQ-T7, ThermoFisher Ion GeneStudio S5 and Ion Proton P1) and third generation sequencers (Oxford Nanopore Technologies MinION R9 and Pacific Biosciences Sequel II). We constructed three uneven synthetic microbial communities composed of up to 87 genomic microbial strains DNAs per mock, spanning 29 bacterial and archaeal phyla, and representing the most complex and diverse synthetic communities used for sequencing technology comparisons. Our results demonstrate that third generation sequencing have advantages over second generation platforms in analyzing complex microbial communities, but require careful sequencing library preparation for optimal quantitative metagenomic analysis. Our sequencing data also provides a valuable resource for testing and benchmarking bioinformatics software for metagenomics.
Collapse
Affiliation(s)
- Victoria Meslier
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France
| | - Benoit Quinquis
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France
| | - Kévin Da Silva
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France
| | | | - Nicolas Pons
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France
| | - Hugo Roume
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France
| | - Mircea Podar
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
| | - Mathieu Almeida
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France.
| |
Collapse
|
5
|
Wu Z, Wang Y, Zeng J, Zhou Y. Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking. BMC Genomics 2022; 23:746. [DOI: 10.1186/s12864-022-08967-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 10/25/2022] [Indexed: 11/11/2022] Open
Abstract
Abstract
Background
So far, a lot of binning approaches have been intensively developed for untangling metagenome-assembled genomes (MAGs) and evaluated by two main strategies. The strategy by comparison to known genomes prevails over the other strategy by using single-copy genes. However, there is still no dataset with all known genomes for a real (not simulated) bacterial consortium yet.
Results
Here, we continue investigating the real bacterial consortium F1RT enriched and sequenced by us previously, considering the high possibility to unearth all MAGs, due to its low complexity. The improved F1RT metagenome reassembled by metaSPAdes here utilizes about 98.62% of reads, and a series of analyses for the remaining reads suggests that the possibility of containing other low-abundance organisms in F1RT is greatly low, demonstrating that almost all MAGs are successfully assembled. Then, 4 isolates are obtained and individually sequenced. Based on the 4 isolate genomes and the entire metagenome, an elaborate pipeline is then in-house developed to construct all F1RT MAGs. A series of assessments extensively prove the high reliability of the herein reconstruction. Next, our findings further show that this dataset harbors several properties challenging for binning and thus is suitable to compare advanced binning tools available now or benchmark novel binners. Using this dataset, 8 advanced binning algorithms are assessed, giving useful insights for developing novel approaches. In addition, compared with our previous study, two novel MAGs termed FC8 and FC9 are discovered here, and 7 MAGs are solidly unearthed for species without any available genomes.
Conclusion
To our knowledge, it is the first time to construct a dataset with almost all known MAGs for a not simulated consortium. We hope that this dataset will be used as a routine toolkit to complement mock datasets for evaluating binning methods to further facilitate binning and metagenomic studies in the future.
Collapse
|
6
|
Iquebal MA, Jagannadham J, Jaiswal S, Prabha R, Rai A, Kumar D. Potential Use of Microbial Community Genomes in Various Dimensions of Agriculture Productivity and Its Management: A Review. Front Microbiol 2022; 13:708335. [PMID: 35655999 PMCID: PMC9152772 DOI: 10.3389/fmicb.2022.708335] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 03/17/2022] [Indexed: 12/12/2022] Open
Abstract
Agricultural productivity is highly influenced by its associated microbial community. With advancements in omics technology, metagenomics is known to play a vital role in microbial world studies by unlocking the uncultured microbial populations present in the environment. Metagenomics is a diagnostic tool to target unique signature loci of plant and animal pathogens as well as beneficial microorganisms from samples. Here, we reviewed various aspects of metagenomics from experimental methods to techniques used for sequencing, as well as diversified computational resources, including databases and software tools. Exhaustive focus and study are conducted on the application of metagenomics in agriculture, deciphering various areas, including pathogen and plant disease identification, disease resistance breeding, plant pest control, weed management, abiotic stress management, post-harvest management, discoveries in agriculture, source of novel molecules/compounds, biosurfactants and natural product, identification of biosynthetic molecules, use in genetically modified crops, and antibiotic-resistant genes. Metagenomics-wide association studies study in agriculture on crop productivity rates, intercropping analysis, and agronomic field is analyzed. This article is the first of its comprehensive study and prospects from an agriculture perspective, focusing on a wider range of applications of metagenomics and its association studies.
Collapse
Affiliation(s)
- Mir Asif Iquebal
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Jaisri Jagannadham
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Sarika Jaiswal
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Ratna Prabha
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anil Rai
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Dinesh Kumar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
- School of Interdisciplinary and Applied Sciences, Central University of Haryana, Mahendergarh, Haryana, India
| |
Collapse
|
7
|
Bani A, Randall KC, Clark DR, Gregson BH, Henderson DK, Losty EC, Ferguson RM. Mind the gaps: What do we know about how multiple chemical stressors impact freshwater aquatic microbiomes? ADV ECOL RES 2022. [DOI: 10.1016/bs.aecr.2022.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
8
|
Balaji A, Sapoval N, Seto C, Leo Elworth R, Fu Y, Nute MG, Savidge T, Segarra S, Treangen TJ. KOMB: K-core based de novo characterization of copy number variation in microbiomes. Comput Struct Biotechnol J 2022; 20:3208-3222. [PMID: 35832621 PMCID: PMC9249589 DOI: 10.1016/j.csbj.2022.06.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 11/29/2022] Open
Abstract
Characterizing metagenomes via kmer-based, database-dependent taxonomic classification has yielded key insights into underlying microbiome dynamics. However, novel approaches are needed to track community dynamics and genomic flux within metagenomes, particularly in response to perturbations. We describe KOMB, a novel method for tracking genome level dynamics within microbiomes. KOMB utilizes K-core decomposition to identify Structural variations (SVs), specifically, population-level Copy Number Variation (CNV) within microbiomes. K-core decomposition partitions the graph into shells containing nodes of induced degree at least K, yielding reduced computational complexity compared to prior approaches. Through validation on a synthetic community, we show that KOMB recovers and profiles repetitive genomic regions in the sample. KOMB is shown to identify functionally-important regions in Human Microbiome Project datasets, and was used to analyze longitudinal data and identify keystone taxa in Fecal Microbiota Transplantation (FMT) samples. In summary, KOMB represents a novel graph-based, taxonomy-oblivious, and reference-free approach for tracking CNV within microbiomes. KOMB is open source and available for download at https://gitlab.com/treangenlab/komb.
Collapse
Affiliation(s)
- Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Charlie Seto
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA
| | - R.A. Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Michael G. Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Tor Savidge
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Santiago Segarra
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
- Corresponding author.
| | - Todd J. Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Corresponding author.
| |
Collapse
|
9
|
Abdulsada Z, Kibbee R, Princz J, DeRosa M, Örmeci B. Transformation of Silver Nanoparticles (AgNPs) during Lime Treatment of Wastewater Sludge and Their Impact on Soil Bacteria. NANOMATERIALS 2021; 11:nano11092330. [PMID: 34578645 PMCID: PMC8465233 DOI: 10.3390/nano11092330] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 08/30/2021] [Accepted: 08/31/2021] [Indexed: 11/16/2022]
Abstract
This study investigated the impact of lime stabilization on the fate and transformation of AgNPs. It also evaluated the changes in the population and diversity of the five most relevant bacterial phyla in soil after applying lime-stabilized sludge containing AgNPs. The study was performed by spiking an environmentally relevant concentration of AgNPs (2 mg AgNPs/g TS) in sludge, applying lime stabilization to increase pH to above 12 for two hours, and applying lime-treated sludge to soil samples. Transmission electron microscopy (TEM) and energy-dispersive X-ray spectroscopy (EDS) were used to investigate the morphological and compositional changes of AgNPs during lime stabilization. After the application of lime stabilized sludge to the soil, soil samples were periodically analyzed for total genomic DNA and changes in bacterial phyla diversity using quantitative polymerase chain reaction (qPCR). The results showed that lime treatment effectively removed AgNPs from the aqueous phase, and AgNPs were deposited on the lime molecules. The results revealed that AgNPs did not significantly impact the presence and diversity of the assessed phyla in the soil. However, lime stabilized sludge with AgNPs affected the abundance of each phylum over time. No significant effects on the soil total organic carbon (TOC), heterotrophic plate count (HPC), and percentage of the live cells were observed.
Collapse
Affiliation(s)
- Zainab Abdulsada
- Department of Civil and Environmental Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada; (Z.A.); (R.K.)
- Department of Environmental Engineering, University of Baghdad, Karrada, Al-Jadriya, Baghdad, Iraq
| | - Richard Kibbee
- Department of Civil and Environmental Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada; (Z.A.); (R.K.)
| | - Juliska Princz
- Environment and Climate Change Canada, 335 River Road South, Ottawa, ON K1V 1C7, Canada;
| | - Maria DeRosa
- Department of Chemistry, Carleton University, Ottawa, ON K1S 5B6, Canada;
| | - Banu Örmeci
- Department of Civil and Environmental Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada; (Z.A.); (R.K.)
- Correspondence: ; Tel.: +1-613-520-2600 (ext. 4144)
| |
Collapse
|
10
|
Zhang Z, Zhang L. METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs. BMC Bioinformatics 2021; 22:378. [PMID: 34294039 PMCID: PMC8296540 DOI: 10.1186/s12859-021-04284-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 07/06/2021] [Indexed: 12/14/2022] Open
Abstract
Background Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs’ nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. Results We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs’ weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. Conclusions Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04284-4.
Collapse
Affiliation(s)
- Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China.
| |
Collapse
|
11
|
Yi H, Lin Y, Lin C, Jin W. Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis. Genome Biol 2021; 22:84. [PMID: 33726811 PMCID: PMC7962209 DOI: 10.1186/s13059-021-02303-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 02/23/2021] [Indexed: 11/10/2022] Open
Abstract
Here, we develop k -mer substring space decomposition (Kssd), a sketching technique which is significantly faster and more accurate than current sketching methods. We show that it is the only method that can be used for large-scale dataset comparisons at population resolution on simulated and real data. Using Kssd, we prioritize references for all 1,019,179 bacteria whole genome sequencing (WGS) runs from NCBI Sequence Read Archive and find misidentification or contamination in 6164 of these. Additionally, we analyze WGS and exome runs of samples from the 1000 Genomes Project.
Collapse
Affiliation(s)
- Huiguang Yi
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055 Guangdong China
- Institute of Life Sciences, Southeast University, Nanjing, 210096 Jiangsu China
| | - Yanling Lin
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055 Guangdong China
| | - Chengqi Lin
- Institute of Life Sciences, Southeast University, Nanjing, 210096 Jiangsu China
| | - Wenfei Jin
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055 Guangdong China
| |
Collapse
|
12
|
Rain-Franco A, de Moraes GP, Beier S. Cryopreservation and Resuscitation of Natural Aquatic Prokaryotic Communities. Front Microbiol 2021; 11:597653. [PMID: 33584565 PMCID: PMC7877341 DOI: 10.3389/fmicb.2020.597653] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 12/30/2020] [Indexed: 11/25/2022] Open
Abstract
Experimental reproducibility in aquatic microbial ecology is critical to predict the dynamics of microbial communities. However, controlling the initial composition of naturally occurring microbial communities that will be used as the inoculum in experimental setups is challenging, because a proper method for the preservation of those communities is lacking. To provide a feasible method for preservation and resuscitation of natural aquatic prokaryote assemblages, we developed a cryopreservation procedure applied to natural aquatic prokaryotic communities. We studied the impact of inoculum size, processing time, and storage time on the success of resuscitation. We further assessed the effect of different growth media supplemented with dissolved organic matter (DOM) prepared from naturally occurring microorganisms on the recovery of the initially cryopreserved communities obtained from two sites that have contrasting trophic status and environmental heterogeneity. Our results demonstrated that the variability of the resuscitation process among replicates decreased with increasing inoculum size. The degree of similarity between initial and resuscitated communities was influenced by both the growth medium and origin of the community. We further demonstrated that depending on the inoculum source, 45-72% of the abundant species in the initially natural microbial communities could be detected as viable cells after cryopreservation. Processing time and long-term storage up to 12 months did not significantly influence the community composition after resuscitation. However, based on our results, we recommend keeping handling time to a minimum and ensure identical incubation conditions for repeated resuscitations from cryo-preserved aliquots at different time points. Given our results, we recommend cryopreservation as a promising tool to advance experimental research in the field of microbial ecology.
Collapse
Affiliation(s)
- Angel Rain-Franco
- UMR 7621 Laboratoire d’Océanographie Microbienne, Observatoire Océanologique de Banyuls-sur-Mer, Sorbonne Université, Banyuls-sur-Mer, France
| | - Guilherme Pavan de Moraes
- UMR 7621 Laboratoire d’Océanographie Microbienne, Observatoire Océanologique de Banyuls-sur-Mer, Sorbonne Université, Banyuls-sur-Mer, France
- Graduate Program in Ecology and Natural Resources (PPGERN), Laboratory of Phycology, Department of Botany, Universidade Federal de São Carlos, São Carlos, Brazil
- Department of Biological Oceanography, Leibniz Institute for Baltic Sea Research Warnemünde, Rostock, Germany
| | - Sara Beier
- UMR 7621 Laboratoire d’Océanographie Microbienne, Observatoire Océanologique de Banyuls-sur-Mer, Sorbonne Université, Banyuls-sur-Mer, France
- Department of Biological Oceanography, Leibniz Institute for Baltic Sea Research Warnemünde, Rostock, Germany
| |
Collapse
|
13
|
Stevick RJ, Post AF, Gómez-Chiarri M. Functional plasticity in oyster gut microbiomes along a eutrophication gradient in an urbanized estuary. Anim Microbiome 2021; 3:5. [PMID: 33499983 PMCID: PMC7934548 DOI: 10.1186/s42523-020-00066-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 11/29/2020] [Indexed: 01/04/2023] Open
Abstract
Background Oysters in coastal environments are subject to fluctuating environmental conditions that may impact the ecosystem services they provide. Oyster-associated microbiomes are responsible for some of these services, particularly nutrient cycling in benthic habitats. The effects of climate change on host-associated microbiome composition are well-known, but functional changes and how they may impact host physiology and ecosystem functioning are poorly characterized. We investigated how environmental parameters affect oyster-associated microbial community structure and function along a trophic gradient in Narragansett Bay, Rhode Island, USA. Adult eastern oyster, Crassostrea virginica, gut and seawater samples were collected at 5 sites along this estuarine nutrient gradient in August 2017. Samples were analyzed by 16S rRNA gene sequencing to characterize bacterial community structures and metatranscriptomes were sequenced to determine oyster gut microbiome responses to local environments. Results There were significant differences in bacterial community structure between the eastern oyster gut and water samples, suggesting selection of certain taxa by the oyster host. Increasing salinity, pH, and dissolved oxygen, and decreasing nitrate, nitrite and phosphate concentrations were observed along the North to South gradient. Transcriptionally active bacterial taxa were similar for the different sites, but expression of oyster-associated microbial genes involved in nutrient (nitrogen and phosphorus) cycling varied throughout the Bay, reflecting the local nutrient regimes and prevailing environmental conditions. Conclusions The observed shifts in microbial community composition and function inform how estuarine conditions affect host-associated microbiomes and their ecosystem services. As the effects of estuarine acidification are expected to increase due to the combined effects of eutrophication, coastal pollution, and climate change, it is important to determine relationships between host health, microbial community structure, and environmental conditions in benthic communities. Supplementary Information The online version contains supplementary material available at 10.1186/s42523-020-00066-0.
Collapse
Affiliation(s)
- Rebecca J Stevick
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, USA
| | - Anton F Post
- Division of Research, Florida Atlantic University, Boca Raton, FL, USA
| | - Marta Gómez-Chiarri
- Department of Fisheries, Animal and Veterinary Sciences, University of Rhode Island, Kingston, RI, USA.
| |
Collapse
|
14
|
Thornton CN, Tanner WD, VanDerslice JA, Brazelton WJ. Localized effect of treated wastewater effluent on the resistome of an urban watershed. Gigascience 2020; 9:5992824. [PMID: 33215210 PMCID: PMC7677451 DOI: 10.1093/gigascience/giaa125] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/14/2020] [Indexed: 11/14/2022] Open
Abstract
Background Wastewater treatment is an essential tool for maintaining water quality in urban environments. While the treatment of wastewater can remove most bacterial cells, some will inevitably survive treatment to be released into natural environments. Previous studies have investigated antibiotic resistance within wastewater treatment plants, but few studies have explored how a river’s complete set of antibiotic resistance genes (the “resistome") is affected by the release of treated effluent into surface waters. Results Here we used high-throughput, deep metagenomic sequencing to investigate the effect of treated wastewater effluent on the resistome of an urban river and the downstream distribution of effluent-associated antibiotic resistance genes and mobile genetic elements. Treated effluent release was found to be associated with increased abundance and diversity of antibiotic resistance genes and mobile genetic elements. The impact of wastewater discharge on the river’s resistome diminished with increasing distance from effluent discharge points. The resistome at river locations that were not immediately downstream from any wastewater discharge points was dominated by a single integron carrying genes associated with resistance to sulfonamides and quaternary ammonium compounds. Conclusions Our study documents variations in the resistome of an urban watershed from headwaters to a major confluence in an urban center. Greater abundances and diversity of antibiotic resistance genes are associated with human fecal contamination in river surface water, but the fecal contamination effect seems to be localized, with little measurable effect in downstream waters. The diverse composition of antibiotic resistance genes throughout the watershed suggests the influence of multiple environmental and biological factors.
Collapse
Affiliation(s)
- Christopher N Thornton
- School of Biological Sciences, University of Utah, 257 South 1400 East, Rm. 201, 84112, Salt Lake City, UT, USA
| | - Windy D Tanner
- Department of Family and Preventive Medicine, University of Utah, 257 South 1400 East, Rm. 201, 84112, Salt Lake City, UT, USA
| | - James A VanDerslice
- Department of Family and Preventive Medicine, University of Utah, 257 South 1400 East, Rm. 201, 84112, Salt Lake City, UT, USA
| | - William J Brazelton
- School of Biological Sciences, University of Utah, 257 South 1400 East, Rm. 201, 84112, Salt Lake City, UT, USA
| |
Collapse
|
15
|
Seyler L, Kujawinski EB, Azua-Bustos A, Lee MD, Marlow J, Perl SM, Cleaves II HJ. Metabolomics as an Emerging Tool in the Search for Astrobiologically Relevant Biomarkers. ASTROBIOLOGY 2020; 20:1251-1261. [PMID: 32551936 PMCID: PMC7116171 DOI: 10.1089/ast.2019.2135] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
It is now routinely possible to sequence and recover microbial genomes from environmental samples. To the degree it is feasible to assign transcriptional and translational functions to these genomes, it should be possible, in principle, to largely understand the complete molecular inputs and outputs of a microbial community. However, gene-based tools alone are presently insufficient to describe the full suite of chemical reactions and small molecules that compose a living cell. Metabolomic tools have developed quickly and now enable rapid detection and identification of small molecules within biological and environmental samples. The convergence of these technologies will soon facilitate the detection of novel enzymatic activities, novel organisms, and potentially extraterrestrial life-forms on solar system bodies. This review explores the methodological problems and scientific opportunities facing researchers who hope to apply metabolomic methods in astrobiology-related fields, and how present challenges might be overcome.
Collapse
Affiliation(s)
- Lauren Seyler
- Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
- Blue Marble Space Institute of Science, Seattle, Washington, USA
- Address correspondence to: Lauren Seyler, Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, 86 Water Street, Woods Hole, MA 02543, USA
| | - Elizabeth B. Kujawinski
- Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Armando Azua-Bustos
- Department of Planetology and Habitability, Centro de Astrobiología (CSIC-INTA), Madrid, Spain
- Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud, Universidad Autónoma de Chile, Santiago, Chile
| | - Michael D. Lee
- Blue Marble Space Institute of Science, Seattle, Washington, USA
- Exobiology Branch, NASA Ames Research Center, Moffett Field, California, USA
| | - Jeffrey Marlow
- Blue Marble Space Institute of Science, Seattle, Washington, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA
- Department of Biology, Boston University, Boston, Massachusetts, USA
| | - Scott M. Perl
- Geological and Planetary Sciences, California Institute of Technology/NASA Jet Propulsion Laboratory, Pasadena, California, USA
- Mineral Sciences, Los Angeles Natural History Museum, Los Angeles, California, USA
| | - Henderson James Cleaves II
- Blue Marble Space Institute of Science, Seattle, Washington, USA
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- School of Natural Sciences, Institute for Advanced Study, Princeton, New Jersey, USA
- Geographical Research Laboratory, Carnegie Institution of Washington
| |
Collapse
|
16
|
Awan MG, Deslippe J, Buluc A, Selvitopi O, Hofmeyr S, Oliker L, Yelick K. ADEPT: a domain independent sequence alignment strategy for gpu architectures. BMC Bioinformatics 2020; 21:406. [PMID: 32933482 PMCID: PMC7493400 DOI: 10.1186/s12859-020-03720-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 08/21/2020] [Indexed: 12/28/2022] Open
Abstract
Background Bioinformatic workflows frequently make use of automated genome assembly and protein clustering tools. At the core of most of these tools, a significant portion of execution time is spent in determining optimal local alignment between two sequences. This task is performed with the Smith-Waterman algorithm, which is a dynamic programming based method. With the advent of modern sequencing technologies and increasing size of both genome and protein databases, a need for faster Smith-Waterman implementations has emerged. Multiple SIMD strategies for the Smith-Waterman algorithm are available for CPUs. However, with the move of HPC facilities towards accelerator based architectures, a need for an efficient GPU accelerated strategy has emerged. Existing GPU based strategies have either been optimized for a specific type of characters (Nucleotides or Amino Acids) or for only a handful of application use-cases. Results In this paper, we present ADEPT, a new sequence alignment strategy for GPU architectures that is domain independent, supporting alignment of sequences from both genomes and proteins. Our proposed strategy uses GPU specific optimizations that do not rely on the nature of sequence. We demonstrate the feasibility of this strategy by implementing the Smith-Waterman algorithm and comparing it to similar CPU strategies as well as the fastest known GPU methods for each domain. ADEPT’s driver enables it to scale across multiple GPUs and allows easy integration into software pipelines which utilize large scale computational systems. We have shown that the ADEPT based Smith-Waterman algorithm demonstrates a peak performance of 360 GCUPS and 497 GCUPs for protein based and DNA based datasets respectively on a single GPU node (8 GPUs) of the Cori Supercomputer. Overall ADEPT shows 10x faster performance in a node-to-node comparison against a corresponding SIMD CPU implementation. Conclusions ADEPT demonstrates a performance that is either comparable or better than existing GPU strategies. We demonstrated the efficacy of ADEPT in supporting existing bionformatics software pipelines by integrating ADEPT in MetaHipMer a high-performance denovo metagenome assembler and PASTIS a high-performance protein similarity graph construction pipeline. Our results show 10% and 30% boost of performance in MetaHipMer and PASTIS respectively.
Collapse
Affiliation(s)
- Muaaz G Awan
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA.
| | - Jack Deslippe
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Aydin Buluc
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Oguz Selvitopi
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Steven Hofmeyr
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Leonid Oliker
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Katherine Yelick
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| |
Collapse
|
17
|
Dvorkina T, Antipov D, Korobeynikov A, Nurk S. SPAligner: alignment of long diverged molecular sequences to assembly graphs. BMC Bioinformatics 2020; 21:306. [PMID: 32703258 PMCID: PMC7379835 DOI: 10.1186/s12859-020-03590-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 06/08/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts - in particular the analysis of metagenomic sequencing data. RESULTS In this work we present a novel SPAligner (Saint-Petersburg Aligner) tool for aligning long diverged nucleotide and amino acid sequences to assembly graphs. We demonstrate that SPAligner is an efficient solution for mapping third generation sequencing reads onto assembly graphs of various complexity and also show how it can facilitate the identification of known genes in complex metagenomic datasets. CONCLUSIONS Our work will facilitate accelerating the development of graph-based approaches in solving sequence to genome assembly alignment problem. SPAligner is implemented as a part of SPAdes tools library and is available on Github.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
- Department of Statistical Modelling, St. Petersburg State University, St. Petersburg, Russia
| | - Sergey Nurk
- Genome Informatics Section, NHGRI, National Institutes of Health, Bethesda MD, USA
| |
Collapse
|
18
|
Podar PT, Yang Z, Björnsdóttir SH, Podar M. Comparative Analysis of Microbial Diversity Across Temperature Gradients in Hot Springs From Yellowstone and Iceland. Front Microbiol 2020; 11:1625. [PMID: 32760379 PMCID: PMC7372906 DOI: 10.3389/fmicb.2020.01625] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Accepted: 06/22/2020] [Indexed: 11/21/2022] Open
Abstract
Geothermal hot springs are a natural setting to study microbial adaptation to a wide range of temperatures reaching up to boiling. Temperature gradients lead to distinct microbial communities that inhabit their optimum niches. We sampled three alkaline, high temperature (80-100°C) hot springs in Yellowstone and Iceland that had cooling outflows and whose microbial communities had not been studied previously. The microbial composition in sediments and mats was determined by DNA sequencing of rRNA gene amplicons. Over three dozen phyla of Archaea and Bacteria were identified, representing over 1700 distinct organisms. We observed a significant non-linear reduction in the number of microbial taxa as the temperature increased from warm (38°C) to boiling. At high taxonomic levels, the community structure was similar between the Yellowstone and Iceland hot springs. We identified potential endemism at the genus level, especially in thermophilic phototrophs, which may have been potentially driven by distinct environmental conditions and dispersal limitations.
Collapse
Affiliation(s)
- Peter T. Podar
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Zamin Yang
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | | | - Mircea Podar
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|
19
|
Brown CT, Moritz D, O'Brien MP, Reidl F, Reiter T, Sullivan BD. Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity. Genome Biol 2020; 21:164. [PMID: 32631445 PMCID: PMC7336657 DOI: 10.1186/s13059-020-02066-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 05/29/2020] [Indexed: 11/10/2022] Open
Abstract
Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/spacegraphcats under the 3-Clause BSD License.
Collapse
Affiliation(s)
- C Titus Brown
- Department of Population Health and Reproduction, University of California Davis, Davis, USA.
| | - Dominik Moritz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | | | - Felix Reidl
- Department of Computer Science, NC State University, Raleigh, USA
| | - Taylor Reiter
- Department of Population Health and Reproduction, University of California Davis, Davis, USA
| | - Blair D Sullivan
- Department of Computer Science, NC State University, Raleigh, USA.
| |
Collapse
|
20
|
Elworth RAL, Wang Q, Kota PK, Barberan CJ, Coleman B, Balaji A, Gupta G, Baraniuk RG, Shrivastava A, Treangen T. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic Acids Res 2020; 48:5217-5234. [PMID: 32338745 PMCID: PMC7261164 DOI: 10.1093/nar/gkaa265] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/20/2020] [Accepted: 04/04/2020] [Indexed: 02/01/2023] Open
Abstract
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
Collapse
Affiliation(s)
| | - Qi Wang
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| | - Pavan K Kota
- Department of Bioengineering, Houston, TX 77005, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Benjamin Coleman
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Advait Balaji
- Department of Computer Science, Houston, TX 77005, USA
| | - Gaurav Gupta
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Richard G Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Anshumali Shrivastava
- Department of Computer Science, Houston, TX 77005, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Todd J Treangen
- Department of Computer Science, Houston, TX 77005, USA
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| |
Collapse
|
21
|
Xue Y, Lanzén A, Jonassen I. Reconstructing ribosomal genes from large scale total RNA meta-transcriptomic data. Bioinformatics 2020; 36:3365-3371. [PMID: 32167532 PMCID: PMC7267836 DOI: 10.1093/bioinformatics/btaa177] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 02/01/2020] [Accepted: 03/10/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Technological advances in meta-transcriptomics have enabled a deeper understanding of the structure and function of microbial communities. 'Total RNA' meta-transcriptomics, sequencing of total reverse transcribed RNA, provides a unique opportunity to investigate both the structure and function of active microbial communities from all three domains of life simultaneously. A major step of this approach is the reconstruction of full-length taxonomic marker genes such as the small subunit ribosomal RNA. However, current tools for this purpose are mainly targeted towards analysis of amplicon and metagenomic data and thus lack the ability to handle the massive and complex datasets typically resulting from total RNA experiments. RESULTS In this work, we introduce MetaRib, a new tool for reconstructing ribosomal gene sequences from total RNA meta-transcriptomic data. MetaRib is based on the popular rRNA assembly program EMIRGE, together with several improvements. We address the challenge posed by large complex datasets by integrating sub-assembly, dereplication and mapping in an iterative approach, with additional post-processing steps. We applied the method to both simulated and real-world datasets. Our results show that MetaRib can deal with larger datasets and recover more rRNA genes, which achieve around 60 times speedup and higher F1 score compared to EMIRGE in simulated datasets. In the real-world dataset, it shows similar trends but recovers more contigs compared with a previous analysis based on random sub-sampling, while enabling the comparison of individual contig abundances across samples for the first time. AVAILABILITY AND IMPLEMENTATION The source code of MetaRib is freely available at https://github.com/yxxue/MetaRib. CONTACT yaxin.xue@uib.no or Inge.Jonassen@uib.no. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yaxin Xue
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Anders Lanzén
- AZTI-Tecnalia, Herrera Kaia, 20110 Pasaia, Spain.,Ikerbasque, Basque Foundation for Science, 48011 Bilbao, Spain
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
22
|
Cholet F, Ijaz UZ, Smith CJ. Reverse transcriptase enzyme and priming strategy affect quantification and diversity of environmental transcripts. Environ Microbiol 2020; 22:2383-2402. [PMID: 32285609 DOI: 10.1111/1462-2920.15017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 04/08/2020] [Indexed: 12/28/2022]
Abstract
Reverse-transcriptase-quantitative PCR (RT-Q-PCR) and RT-PCR amplicon sequencing, provide a convenient, target-specific, high-sensitivity approach for gene expression studies and are widely used in environmental microbiology. Yet, the effectiveness and reproducibility of the reverse transcription step has not been evaluated. Therefore, we tested a combination of four commercial reverse transcriptases with two priming techniques to faithfully transcribe 16S rRNA and amoA transcripts from marine sediments. Both enzyme and priming strategy greatly affected quantification of the exact same target with differences of up to 600-fold. Furthermore, the choice of RT system significantly changed the communities recovered. For 16S rRNA, both enzyme and priming had a significant effect with enzyme having a stronger impact than priming. Inversely, for amoA only the change in priming strategy resulted in significant differences between the same samples. Specifically, more OTUs and better coverage of amoA transcripts diversity were obtained with GS priming indicating this approach was better at recovering the diversity of amoA transcripts. Moreover, sequencing of RNA mock communities revealed that, even though transcript α diversities (i.e., OTU counts within a sample) can be biased by the RT, the comparison of β diversities (i.e., differences in OTU counts between samples) is reliable as those biases are reproducible between environments.
Collapse
Affiliation(s)
- Fabien Cholet
- Infrastructure and Environment Research Division, James Watt School of Engineering, University of Glasgow, Glasgow, Scotland, G12 8LT, UK
| | - Umer Z Ijaz
- Infrastructure and Environment Research Division, James Watt School of Engineering, University of Glasgow, Glasgow, Scotland, G12 8LT, UK
| | - Cindy J Smith
- Infrastructure and Environment Research Division, James Watt School of Engineering, University of Glasgow, Glasgow, Scotland, G12 8LT, UK
| |
Collapse
|
23
|
Vannier N, Bittebiere AK, Mony C, Vandenkoornhuyse P. Root endophytic fungi impact host plant biomass and respond to plant composition at varying spatio-temporal scales. FUNGAL ECOL 2020. [DOI: 10.1016/j.funeco.2019.100907] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
24
|
Obiol A, Giner CR, Sánchez P, Duarte CM, Acinas SG, Massana R. A metagenomic assessment of microbial eukaryotic diversity in the global ocean. Mol Ecol Resour 2020; 20. [PMID: 32065492 DOI: 10.1111/1755-0998.13147] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 01/31/2020] [Accepted: 02/10/2020] [Indexed: 01/23/2023]
Abstract
Surveying microbial diversity and function is accomplished by combining complementary molecular tools. Among them, metagenomics is a PCR free approach that contains all genetic information from microbial assemblages and is today performed at a relatively large scale and reasonable cost, mostly based on very short reads. Here, we investigated the potential of metagenomics to provide taxonomic reports of marine microbial eukaryotes. We prepared a curated database with reference sequences of the V4 region of 18S rDNA clustered at 97% similarity and used this database to extract and classify metagenomic reads. More than half of them were unambiguously affiliated to a unique reference whilst the rest could be assigned to a given taxonomic group. The overall diversity reported by metagenomics was similar to that obtained by amplicon sequencing of the V4 and V9 regions of the 18S rRNA gene, although either one or both of these amplicon surveys performed poorly for groups like Excavata, Amoebozoa, Fungi and Haptophyta. We then studied the diversity of picoeukaryotes and nanoeukaryotes using 91 metagenomes from surface down to bathypelagic layers in different oceans, unveiling a clear taxonomic separation between size fractions and depth layers. Finally, we retrieved long rDNA sequences from assembled metagenomes that improved phylogenetic reconstructions of particular groups. Overall, this study shows metagenomics as an excellent resource for taxonomic exploration of marine microbial eukaryotes.
Collapse
Affiliation(s)
- Aleix Obiol
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM-CSIC), Barcelona, Spain
| | - Caterina R Giner
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM-CSIC), Barcelona, Spain
| | - Pablo Sánchez
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM-CSIC), Barcelona, Spain
| | - Carlos M Duarte
- Red Sea Research Center (RSRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Silvia G Acinas
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM-CSIC), Barcelona, Spain
| | - Ramon Massana
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM-CSIC), Barcelona, Spain
| |
Collapse
|
25
|
Shabana I, Al-Enazi A. Investigation of plasmid-mediated resistance in E. coli isolated from healthy and diarrheic sheep and goats. Saudi J Biol Sci 2020; 27:788-796. [PMID: 32127753 PMCID: PMC7042619 DOI: 10.1016/j.sjbs.2020.01.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 12/18/2019] [Accepted: 01/06/2020] [Indexed: 11/06/2022] Open
Abstract
Escherichia coli is zoonotic bacteria and the emergence of antimicrobial-resistant strains becomes a critical issue in both human and animal health globally. This study was therefore aimed to investigate the plasmid-mediated resistance in E. coli strains isolated from healthy and diarrheic sheep and goats. A total of 234 fecal samples were obtained from 157 sheep (99 healthy and 58 diarrheic) and 77 goats (32 healthy and 45 diarrheic) for the isolation and identification of E. coli. Plasmid DNA was extracted using the alkaline lysis method. Phenotypic antibiotic susceptibility profiles were determined against the three classes of antimicrobials, which resistance is mediated by plasmids (Cephalosporins, Fluoroquinolone, and Aminoglycosides) using the disc-diffusion method. The frequency of plasmid-mediated resistance genes was investigated by PCR. A total of 159 E. coli strains harbored plasmids. The isolates antibiogram showed different patterns of resistance in both healthy and diarrheic animals. A total of (82; 51.5%) E. coli strains were multidrug-resistant. rmtB gene was detected in all Aminoglycoside-resistant E. coli, and the ESBL-producing E. coli possessed different CTX-M genes. Similarly, fluoroquinolone-resistant E. coli possessed different qnr genes. On the analysis of the gyrB gene sequence of fluoroquinolone-resistant E. coli, multiple point mutations were revealed. In conclusion, a high prevalence of E. coli with high resistance patterns to antimicrobials was revealed in the current study, in addition to a wide distribution of their resistance determinants. These findings highlight the importance of sheep and goats as reservoirs for the dissemination of MDR E. coli and resistance gene horizontal transfer.
Collapse
Affiliation(s)
- I.I. Shabana
- Faculty of Veterinary Medicine, Department of Bacteriology, Immunology and Mycology, Suez Canal University, Egypt
| | - A.T. Al-Enazi
- Biology Department, Faculty of Science, Taibah University, Al-madinah Al-munawarah, Saudi Arabia
| |
Collapse
|
26
|
Chan AWY, Naphtali J, Schellhorn HE. High-throughput DNA sequencing technologies for water and wastewater analysis. Sci Prog 2019; 102:351-376. [PMID: 31818206 PMCID: PMC10424514 DOI: 10.1177/0036850419881855] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Conventional microbiological water monitoring uses culture-dependent techniques to screen indicator microbial species such as Escherichia coli and fecal coliforms. With high-throughput, second-generation sequencing technologies becoming less expensive, water quality monitoring programs can now leverage the massively parallel nature of second-generation sequencing technologies for batch sample processing to simultaneously obtain compositional and functional information of culturable and as yet uncultured microbial organisms. This review provides an introduction to the technical capabilities and considerations necessary for the use of second-generation sequencing technologies, specifically 16S rDNA amplicon and whole-metagenome sequencing, to investigate the composition and functional potential of microbiomes found in water and wastewater systems.
Collapse
Affiliation(s)
| | - James Naphtali
- Department of Biology, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
27
|
Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM. Mash Screen: high-throughput sequence containment estimation for genome discovery. Genome Biol 2019; 20:232. [PMID: 31690338 PMCID: PMC6833257 DOI: 10.1186/s13059-019-1841-x] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/27/2019] [Indexed: 11/17/2022] Open
Abstract
The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.
Collapse
Affiliation(s)
- Brian D. Ondov
- Genome Informatics section, National Human Genome Research Institute, Bethesda, MD USA
- Department of Computer Science, University of Maryland College Park, College Park, MD USA
| | - Gabriel J. Starrett
- Tumor Virus Molecular Biology section, National Cancer Institute, Bethesda, MD USA
| | - Anna Sappington
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Aleksandra Kostic
- Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Sergey Koren
- Genome Informatics section, National Human Genome Research Institute, Bethesda, MD USA
| | - Christopher B. Buck
- Tumor Virus Molecular Biology section, National Cancer Institute, Bethesda, MD USA
| | - Adam M. Phillippy
- Genome Informatics section, National Human Genome Research Institute, Bethesda, MD USA
| |
Collapse
|
28
|
Sanders JG, Nurk S, Salido RA, Minich J, Xu ZZ, Zhu Q, Martino C, Fedarko M, Arthur TD, Chen F, Boland BS, Humphrey GC, Brennan C, Sanders K, Gaffney J, Jepsen K, Khosroheidari M, Green C, Liyanage M, Dang JW, Phelan VV, Quinn RA, Bankevich A, Chang JT, Rana TM, Conrad DJ, Sandborn WJ, Smarr L, Dorrestein PC, Pevzner PA, Knight R. Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads. Genome Biol 2019; 20:226. [PMID: 31672156 PMCID: PMC6822431 DOI: 10.1186/s13059-019-1834-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 09/23/2019] [Indexed: 01/05/2023] Open
Abstract
As metagenomic studies move to increasing numbers of samples, communities like the human gut may benefit more from the assembly of abundant microbes in many samples, rather than the exhaustive assembly of fewer samples. We term this approach leaderboard metagenome sequencing. To explore protocol optimization for leaderboard metagenomics in real samples, we introduce a benchmark of library prep and sequencing using internal references generated by synthetic long-read technology, allowing us to evaluate high-throughput library preparation methods against gold-standard reference genomes derived from the samples themselves. We introduce a low-cost protocol for high-throughput library preparation and sequencing.
Collapse
Affiliation(s)
- Jon G Sanders
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Sergey Nurk
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Rodolfo A Salido
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Jeremiah Minich
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Zhenjiang Z Xu
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Qiyun Zhu
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Cameron Martino
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Marcus Fedarko
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Timothy D Arthur
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | | | - Brigid S Boland
- Division of Gastroenterology, Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Inflammatory Bowel Disease Center, University of California San Diego, La Jolla, CA, USA
| | - Greg C Humphrey
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Caitriona Brennan
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Karenina Sanders
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - James Gaffney
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Kristen Jepsen
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Mahdieh Khosroheidari
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Cliff Green
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Marlon Liyanage
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Jason W Dang
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Vanessa V Phelan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA
| | - Robert A Quinn
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Anton Bankevich
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - John T Chang
- Division of Gastroenterology, Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Inflammatory Bowel Disease Center, University of California San Diego, La Jolla, CA, USA
| | - Tariq M Rana
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
| | - Douglas J Conrad
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - William J Sandborn
- Division of Gastroenterology, Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Inflammatory Bowel Disease Center, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Larry Smarr
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA
| | - Pieter C Dorrestein
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, 92093, USA.
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
- California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
29
|
Paul AJ, Lawrence D, Song M, Lim SH, Pan C, Ahn TH. Using Apache Spark on genome assembly for scalable overlap-graph reduction. Hum Genomics 2019; 13:48. [PMID: 31639049 PMCID: PMC6805285 DOI: 10.1186/s40246-019-0227-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2023] Open
Abstract
Background De novo genome assembly is a technique that builds the genome of a specimen using overlaps of genomic fragments without additional work with reference sequence. Sequence fragments (called reads) are assembled as contigs and scaffolds by the overlaps. The quality of the de novo assembly depends on the length and continuity of the assembly. To enable faster and more accurate assembly of species, existing sequencing techniques have been proposed, for example, high-throughput next-generation sequencing and long-reads-producing third-generation sequencing. However, these techniques require a large amounts of computer memory when very huge-size overlap graphs are resolved. Also, it is challenging for parallel computation. Results To address the limitations, we propose an innovative algorithmic approach, called Scalable Overlap-graph Reduction Algorithms (SORA). SORA is an algorithm package that performs string graph reduction algorithms by Apache Spark. The SORA’s implementations are designed to execute de novo genome assembly on either a single machine or a distributed computing platform. SORA efficiently compacts the number of edges on enormous graphing paths by adapting scalable features of graph processing libraries provided by Apache Spark, GraphX and GraphFrames. Conclusions We shared the algorithms and the experimental results at our project website, https://github.com/BioHPC/SORA. We evaluated SORA with the human genome samples. First, it processed a nearly one billion edge graph on a distributed cloud cluster. Second, it processed mid-to-small size graphs on a single workstation within a short time frame. Overall, SORA achieved the linear-scaling simulations for the increased computing instances.
Collapse
Affiliation(s)
- Alexander J Paul
- Bioinformatics and Computational Biology Program, Saint Louis University, St. Louis, MO, USA
| | - Dylan Lawrence
- Computational and Systems Biology Program, Washington University in St. Louis, St. Louis, MO, USA
| | - Myoungkyu Song
- Department of Computer Science, University of Nebraska at Omaha, Omaha, NE, USA
| | - Seung-Hwan Lim
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Chongle Pan
- School of Computer Science, University of Oklahoma, Norman, OK, USA
| | - Tae-Hyuk Ahn
- Bioinformatics and Computational Biology Program, Saint Louis University, St. Louis, MO, USA. .,Department of Computer Science, Saint Louis University, St. Louis, MO, USA.
| |
Collapse
|
30
|
Guo J, Quensen JF, Sun Y, Wang Q, Brown CT, Cole JR, Tiedje JM. Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes. Front Genet 2019; 10:957. [PMID: 31749830 PMCID: PMC6843070 DOI: 10.3389/fgene.2019.00957] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 09/09/2019] [Indexed: 12/28/2022] Open
Abstract
Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity, and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth.
Collapse
Affiliation(s)
- Jiarong Guo
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - John F. Quensen
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - Yanni Sun
- Department of Electronical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Qiong Wang
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - C. Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, Davis, CA, United States
| | - James R. Cole
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - James M. Tiedje
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
31
|
Liu J, Lian Q, Chen Y, Qi J. Amino acid based de Bruijn graph algorithm for identifying complete coding genes from metagenomic and metatranscriptomic short reads. Nucleic Acids Res 2019; 47:e30. [PMID: 30657979 PMCID: PMC6412133 DOI: 10.1093/nar/gkz017] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 12/19/2018] [Accepted: 01/08/2019] [Indexed: 11/12/2022] Open
Abstract
Metagenomic studies, greatly promoted by the fast development of next-generation sequencing (NGS) technologies, uncover complex structures of microbial communities and their interactions with environment. As the majority of microbes lack information of genome sequences, it is essential to assemble prokaryotic genomes ab initio aiming to retrieve complete coding genes from various metabolic pathways. The complex nature of microbial composition and the burden of handling a vast amount of metagenomic data, bring great challenges to the development of effective and efficient bioinformatic tools. Here we present a protein assembler (MetaPA), based on de Bruijn graph searching on oligopeptide spaces and can be applied on both metagenomic and metatranscriptomic sequencing data. When public homologous protein sequences are involved to guide the assembling procedures, MetaPA assembles 85% of total proteins in complete sequences with high precision of 83% on real high-throughput sequencing datasets. Application of MetaPA on metatranscriptomic data successfully identifies the majority of actively transcribed genes validated in related studies. The results suggest that MetaPA has a good potential in both metagenomic and metatranscriptomic studies to characterize the composition and abundance of microbiota.
Collapse
Affiliation(s)
- Jiemeng Liu
- State key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China.,The T-Life Research Center, Fudan University, Shanghai 200433, China
| | - Qichao Lian
- State key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Yamao Chen
- State key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Ji Qi
- State key Laboratory of Genetic Engineering, Institute of Plant Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
| |
Collapse
|
32
|
Marcelino VR, Irinyi L, Eden JS, Meyer W, Holmes EC, Sorrell TC. Metatranscriptomics as a tool to identify fungal species and subspecies in mixed communities - a proof of concept under laboratory conditions. IMA Fungus 2019; 10:12. [PMID: 32355612 PMCID: PMC7184889 DOI: 10.1186/s43008-019-0012-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 06/19/2019] [Indexed: 12/21/2022] Open
Abstract
High-throughput sequencing (HTS) enables the generation of large amounts of genome sequence data at a reasonable cost. Organisms in mixed microbial communities can now be sequenced and identified in a culture-independent way, usually using amplicon sequencing of a DNA barcode. Bulk RNA-seq (metatranscriptomics) has several advantages over DNA-based amplicon sequencing: it is less susceptible to amplification biases, it captures only living organisms, and it enables a larger set of genes to be used for taxonomic identification. Using a model mock community comprising 17 fungal isolates, we evaluated whether metatranscriptomics can accurately identify fungal species and subspecies in mixed communities. Overall, 72.9% of the RNA transcripts were classified, from which the vast majority (99.5%) were correctly identified at the species level. Of the 15 species sequenced, 13 were retrieved and identified correctly. We also detected strain-level variation within the Cryptococcus species complexes: 99.3% of transcripts assigned to Cryptococcus were classified as one of the four strains used in the mock community. Laboratory contaminants and/or misclassifications were diverse, but represented only 0.44% of the transcripts. Hence, these results show that it is possible to obtain accurate species- and strain-level fungal identification from metatranscriptome data as long as taxa identified at low abundance are discarded to avoid false-positives derived from contamination or misclassifications. This study highlights both the advantages and current challenges in the application of metatranscriptomics in clinical mycology and ecological studies.
Collapse
Affiliation(s)
- Vanesa R Marcelino
- 1Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW 2006 Australia.,Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW 2145 Australia.,4School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006 Australia
| | - Laszlo Irinyi
- 1Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW 2006 Australia.,Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW 2145 Australia
| | - John-Sebastian Eden
- 1Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW 2006 Australia.,Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW 2145 Australia
| | - Wieland Meyer
- 1Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW 2006 Australia.,Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW 2145 Australia.,3Westmead Hospital (Research and Education Network), Westmead, NSW 2145 Australia
| | - Edward C Holmes
- 1Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW 2006 Australia.,4School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006 Australia
| | - Tania C Sorrell
- 1Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW 2006 Australia.,Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW 2145 Australia
| |
Collapse
|
33
|
Cooke I, Mead O, Whalen C, Boote C, Moya A, Ying H, Robbins S, Strugnell JM, Darling A, Miller D, Voolstra CR, Adamska M. Molecular techniques and their limitations shape our view of the holobiont. ZOOLOGY 2019; 137:125695. [PMID: 31759226 DOI: 10.1016/j.zool.2019.125695] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 07/08/2019] [Accepted: 07/12/2019] [Indexed: 11/26/2022]
Abstract
It is now recognised that the biology of almost any organism cannot be fully understood without recognising the existence and potential functional importance of associated microbes. Arguably, the emergence of this holistic viewpoint may never have occurred without the development of a crucial molecular technique, 16S rDNA amplicon sequencing, which allowed microbial communities to be easily profiled across a broad range of contexts. A diverse array of molecular techniques are now used to profile microbial communities, infer their evolutionary histories, visualise them in host tissues, and measure their molecular activity. In this review, we examine each of these categories of measurement and inference with a focus on the questions they make tractable, and the degree to which their capabilities and limitations shape our view of the holobiont.
Collapse
Affiliation(s)
- Ira Cooke
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia.
| | - Oliver Mead
- ARC Centre of Excellence for Coral Reef Studies, Australian National University, Canberra, ACT, 2601, Australia; Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia
| | - Casey Whalen
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia
| | - Chloë Boote
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia
| | - Aurelie Moya
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia
| | - Hua Ying
- Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia
| | - Steven Robbins
- Australian Center for Ecogenomics, University of Queensland, St. Lucia, QLD, 4072, Australia
| | - Jan M Strugnell
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre of Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, 4810, QLD, Australia; Department of Ecology, Environment and Evolution, School of Life Sciences, La Trobe University, Melbourne, 3083, Australia
| | - Aaron Darling
- The ithree institute, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - David Miller
- Department of Molecular and Cell Biology, James Cook University, Townsville, QLD, 4811, Australia; Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, 4811, Australia; ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, 4811, Australia
| | | | - Maja Adamska
- ARC Centre of Excellence for Coral Reef Studies, Australian National University, Canberra, ACT, 2601, Australia; Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia
| | | |
Collapse
|
34
|
Abstract
The sourmash software package uses MinHash-based sketching to create "signatures", compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.
Collapse
Affiliation(s)
- N. Tessa Pierce
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA
| | - Luiz Irber
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA
| | - Taylor Reiter
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA
| | - Phillip Brooks
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA
| | - C. Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, Davis, California, 95616, USA
| |
Collapse
|
35
|
Riiser ES, Haverkamp THA, Varadharajan S, Borgan Ø, Jakobsen KS, Jentoft S, Star B. Switching on the light: using metagenomic shotgun sequencing to characterize the intestinal microbiome of Atlantic cod. Environ Microbiol 2019; 21:2576-2594. [PMID: 31091345 DOI: 10.1111/1462-2920.14652] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 05/07/2019] [Accepted: 05/09/2019] [Indexed: 12/29/2022]
Abstract
Atlantic cod (Gadus morhua) is an ecologically important species with a wide-spread distribution in the North Atlantic Ocean, yet little is known about the diversity of its intestinal microbiome in its natural habitat. No geographical differentiation in this microbiome was observed based on 16S rRNA amplicon analyses, yet such finding may result from an inherent lack of power of this method to resolve fine-scaled biological complexity. Here, we use metagenomic shotgun sequencing to investigate the intestinal microbiome of 19 adult Atlantic cod individuals from two coastal populations in Norway-located 470 km apart. Resolving the species community to unprecedented resolution, we identify two abundant species, Photobacterium iliopiscarium and Photobacterium kishitanii, which comprise over 50% of the classified reads. Interestingly, the intestinal P. kishitanii strains have functionally intact lux genes, and its high abundance suggests that fish intestines form an important part of its ecological niche. These observations support a hypothesis that bioluminescence plays an ecological role in the marine food web. Despite our improved taxonomical resolution, we identify no geographical differences in bacterial community structure, indicating that the intestinal microbiome of these coastal cod is colonized by a limited number of closely related bacterial species with a broad geographical distribution.
Collapse
Affiliation(s)
- Even Sannes Riiser
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway
| | - Thomas H A Haverkamp
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway
| | - Srinidhi Varadharajan
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway
| | - Ørnulf Borgan
- Department of Mathematics, University of Oslo, PO Box 1053, Blindern, N-0316 Oslo, Norway
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO Box 1066, Blindern, N-0316 Oslo, Norway
| |
Collapse
|
36
|
Antipov D, Raiko M, Lapidus A, Pevzner PA. Plasmid detection and assembly in genomic and metagenomic data sets. Genome Res 2019; 29:961-968. [PMID: 31048319 PMCID: PMC6581055 DOI: 10.1101/gr.241299.118] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 04/24/2019] [Indexed: 12/22/2022]
Abstract
Although plasmids are important for bacterial survival and adaptation, plasmid detection and assembly from genomic, let alone metagenomic, samples remain challenging. The recently developed plasmidSPAdes assembler addressed some of these challenges in the case of isolate genomes but stopped short of detecting plasmids in metagenomic assemblies, an untapped source of yet to be discovered plasmids. We present the metaplasmidSPAdes tool for plasmid assembly in metagenomic data sets that reduced the false positive rate of plasmid detection compared with the state-of-the-art approaches. We assembled plasmids in diverse data sets and have shown that thousands of plasmids remained below the radar in already completed genomic and metagenomic studies. Our analysis revealed the extreme variability of plasmids and has led to the discovery of many novel plasmids (including many plasmids carrying antibiotic-resistance genes) without significant similarities to currently known ones.
Collapse
Affiliation(s)
- Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg 199004, Russia
| | - Mikhail Raiko
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg 199004, Russia
| | - Alla Lapidus
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg 199004, Russia
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg 199004, Russia.,Department of Computer Science and Engineering, University of California, San Diego, California 92093-0404, USA
| |
Collapse
|
37
|
Takhampunya R, Korkusol A, Pongpichit C, Yodin K, Rungrojn A, Chanarat N, Promsathaporn S, Monkanna T, Thaloengsok S, Tippayachai B, Kumfao N, Richards AL, Davidson SA. Metagenomic Approach to Characterizing Disease Epidemiology in a Disease-Endemic Environment in Northern Thailand. Front Microbiol 2019; 10:319. [PMID: 30863381 PMCID: PMC6399164 DOI: 10.3389/fmicb.2019.00319] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Accepted: 02/06/2019] [Indexed: 02/01/2023] Open
Abstract
In this study, we used a metagenomic approach to analyze bacterial communities from diverse populations (humans, animals, and vectors) to investigate the role of these microorganisms as causative agents of disease in human and animal populations. Wild rodents and ectoparasites were collected from 2014 to 2018 in Nan province, Thailand where scrub typhus is highly endemic. Samples from undifferentiated febrile illness (UFI) patients were obtained from a local hospital. A total of 200 UFI patient samples were obtained and 309 rodents and 420 pools of ectoparasites were collected from rodents (n = 285) and domestic animals (n = 135). The bacterial 16S rRNA gene was amplified and sequenced with the Illumina. Real-time PCR and Sanger sequencing were used to confirm the next-generation sequencing (NGS) results and to characterize pathogen species. Several pathogens were detected by NGS in all populations studied and the most common pathogens identified included Bartonella spp., Rickettsia spp., Leptospira spp., and Orientia tsutsugamushi. Interestingly, Anaplasma spp. was detected in patient, rodent and tick populations, although they were not previously known to cause human disease from this region. Candidatus Neoehrlichia, Neorickettsia spp., Borrelia spp., and Ehrlichia spp. were detected in rodents and their associated ectoparasites. The same O. tsutsugamushi genotypes were shared among UFI patients, rodents, and chiggers in a single district indicating that the chiggers found on rodents were also likely responsible for transmitting to people. Serological testing using immunofluorescence assays in UFI samples showed high prevalence (IgM/IgG) of Rickettsia and Orientia pathogens, most notably among samples collected during September–November. Additionally, a higher number of seropositive samples belonged to patients in the working age population (20–60 years old). The results presented in this study demonstrate that the increased risk of human infection or exposure to chiggers and their associated pathogen (O. tsutsugamushi) resulted in part from two important factors; working age group and seasons for rice cultivation and harvesting. Evidence of pathogen exposure was shown to occur as there was seropositivity (IgG) in UFI patients for bartonellosis as well as for anaplasmosis. Using a metagenomic approach, this study demonstrated the circulation and transmission of several pathogens in the environment, some of which are known causative agents of illness in human populations.
Collapse
Affiliation(s)
- Ratree Takhampunya
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| | - Achareeya Korkusol
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| | | | | | - Artharee Rungrojn
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| | - Nitima Chanarat
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| | - Sommai Promsathaporn
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| | - Taweesak Monkanna
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| | - Sasikanya Thaloengsok
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| | - Bousaraporn Tippayachai
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| | | | - Allen L Richards
- Viral and Rickettsial Diseases Department, Naval Medical Research Center, Silver Spring, MD, United States
| | - Silas A Davidson
- Department of Entomology, US Army Medical Directorate of the Armed Forces Research Institute of Medical Sciences (USAMD-AFRIMS), Bangkok, Thailand
| |
Collapse
|
38
|
Pitsch G, Bruni EP, Forster D, Qu Z, Sonntag B, Stoeck T, Posch T. Seasonality of Planktonic Freshwater Ciliates: Are Analyses Based on V9 Regions of the 18S rRNA Gene Correlated With Morphospecies Counts? Front Microbiol 2019; 10:248. [PMID: 30837972 PMCID: PMC6389714 DOI: 10.3389/fmicb.2019.00248] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 01/30/2019] [Indexed: 12/23/2022] Open
Abstract
Ciliates represent central nodes in freshwater planktonic food webs, and many species show pronounced seasonality, with short-lived maxima of a few dominant taxa while many being rare or ephemeral. These observations are primarily based on morphospecies counting methods, which, however, have limitations concerning the amount and volume of samples that can be processed. For high sampling frequencies at large scales, high throughput sequencing (HTS) of freshwater ciliates seems to be a promising tool. However, several studies reported large discrepancy between species abundance determinations by molecular compared to morphological means. Therefore, we compared ciliate DNA metabarcodes (V9 regions of the 18S rRNA gene) with morphospecies counts for a 3-year study (Lake Zurich, Switzerland; biweekly sampling, n = 74). In addition, we isolated, cultivated and sequenced the 18S rRNA gene of twelve selected ciliate species that served as seeds for HTS analyses. This workflow allowed for a detailed comparison of V9 data with microscopic analyses by quantitative protargol staining (QPS). The dynamics of V9 read abundances over the seasonal cycle corroborated well with morphospecies population patterns. Annual successions of rare and ephemeral species were more adequately characterized by V9 reads than by QPS. However, numbers of species specific sequence reads only partly reflected rank orders seen by counts. In contrast, biomass-based assemblage compositions showed higher similarity to V9 read numbers, probably indicating a relation between cell sizes and numbers / sizes of macronuclei (or 18S rRNA operons). Full-length 18S rRNA sequences of ciliates assigned to certain morphospecies are urgently needed for barcoding approaches as planktonic taxa are still poorly represented in public databases and the interpretation of HTS data depends on profound reference sequences. Through linking operational taxonomic units (OTUs) with known morphospecies, we can use the deep knowledge about the autecology of these species.
Collapse
Affiliation(s)
- Gianna Pitsch
- Limnological Station, Department of Plant and Microbial Biology, University of Zurich, Kilchberg, Switzerland
| | - Estelle Patricia Bruni
- Limnological Station, Department of Plant and Microbial Biology, University of Zurich, Kilchberg, Switzerland
| | - Dominik Forster
- Ecology Group, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Zhishuai Qu
- Ecology Group, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Bettina Sonntag
- Research Department for Limnology, Mondsee, University of Innsbruck, Mondsee, Austria
| | - Thorsten Stoeck
- Ecology Group, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Thomas Posch
- Limnological Station, Department of Plant and Microbial Biology, University of Zurich, Kilchberg, Switzerland
| |
Collapse
|
39
|
Tedersoo L, Drenkhan R, Anslan S, Morales‐Rodriguez C, Cleary M. High-throughput identification and diagnostics of pathogens and pests: Overview and practical recommendations. Mol Ecol Resour 2019; 19:47-76. [PMID: 30358140 PMCID: PMC7379260 DOI: 10.1111/1755-0998.12959] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 08/01/2018] [Accepted: 08/28/2018] [Indexed: 12/26/2022]
Abstract
High-throughput identification technologies provide efficient tools for understanding the ecology and functioning of microorganisms. Yet, these methods have been only rarely used for monitoring and testing ecological hypotheses in plant pathogens and pests in spite of their immense importance in agriculture, forestry and plant community dynamics. The main objectives of this manuscript are the following: (a) to provide a comprehensive overview about the state-of-the-art high-throughput quantification and molecular identification methods used to address population dynamics, community ecology and host associations of microorganisms, with a specific focus on antagonists such as pathogens, viruses and pests; (b) to compile available information and provide recommendations about specific protocols and workable primers for bacteria, fungi, oomycetes and insect pests; and (c) to provide examples of novel methods used in other microbiological disciplines that are of great potential use for testing specific biological hypotheses related to pathology. Finally, we evaluate the overall perspectives of the state-of-the-art and still evolving methods for diagnostics and population- and community-level ecological research of pathogens and pests.
Collapse
Affiliation(s)
- Leho Tedersoo
- Natural History Museum and Institute of Ecology and Earth SciencesUniversity of TartuTartuEstonia
| | - Rein Drenkhan
- Institute of Forestry and Rural EngineeringEstonian University of Life SciencesTartuEstonia
| | - Sten Anslan
- Natural History Museum and Institute of Ecology and Earth SciencesUniversity of TartuTartuEstonia
| | | | - Michelle Cleary
- Southern Swedish Forest Research CentreSwedish University of Agricultural SciencesAlnarpSweden
| |
Collapse
|
40
|
Pericard P, Dufresne Y, Couderc L, Blanquart S, Touzet H. MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes. Bioinformatics 2018; 34:585-591. [PMID: 29040406 DOI: 10.1093/bioinformatics/btx644] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 10/10/2017] [Indexed: 01/18/2023] Open
Abstract
Motivation Advances in the sequencing of uncultured environmental samples, dubbed metagenomics, raise a growing need for accurate taxonomic assignment. Accurate identification of organisms present within a community is essential to understanding even the most elementary ecosystems. However, current high-throughput sequencing technologies generate short reads which partially cover full-length marker genes and this poses difficult bioinformatic challenges for taxonomy identification at high resolution. Results We designed MATAM, a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. The method implements a stepwise process based on construction and analysis of a read overlap graph. It is applied to the assembly of 16S rRNA markers and is validated on simulated, synthetic and genuine metagenomes. We show that MATAM outperforms other available methods in terms of low error rates and recovered fractions and is suitable to provide improved assemblies for precise taxonomic assignments. Availability and implementation https://github.com/bonsai-team/matam. Contact pierre.pericard@gmail.com or helene.touzet@univ-lille1.fr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pierre Pericard
- CRIStAL (UMR CNRS 9189, Université Lille 1).,Inria Lille Nord-Europe
| | - Yoann Dufresne
- CRIStAL (UMR CNRS 9189, Université Lille 1).,Inria Lille Nord-Europe
| | - Loïc Couderc
- CRIStAL (UMR CNRS 9189, Université Lille 1).,Bilille, 59650 Villeneuve d'Ascq, France
| | - Samuel Blanquart
- CRIStAL (UMR CNRS 9189, Université Lille 1).,Inria Lille Nord-Europe
| | - Hélène Touzet
- CRIStAL (UMR CNRS 9189, Université Lille 1).,Inria Lille Nord-Europe
| |
Collapse
|
41
|
Almeida A, Mitchell AL, Tarkowska A, Finn RD. Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments. Gigascience 2018; 7:4995265. [PMID: 29762668 PMCID: PMC5967554 DOI: 10.1093/gigascience/giy054] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 05/04/2018] [Indexed: 12/30/2022] Open
Abstract
Background Taxonomic profiling of ribosomal RNA (rRNA) sequences has been the accepted norm for inferring the composition of complex microbial ecosystems. Quantitative Insights Into Microbial Ecology (QIIME) and mothur have been the most widely used taxonomic analysis tools for this purpose, with MAPseq and QIIME 2 being two recently released alternatives. However, no independent and direct comparison between these four main tools has been performed. Here, we compared the default classifiers of MAPseq, mothur, QIIME, and QIIME 2 using synthetic simulated datasets comprised of some of the most abundant genera found in the human gut, ocean, and soil environments. We evaluate their accuracy when paired with both different reference databases and variable sub-regions of the 16S rRNA gene. Findings We show that QIIME 2 provided the best recall and F-scores at genus and family levels, together with the lowest distance estimates between the observed and simulated samples. However, MAPseq showed the highest precision, with miscall rates consistently <2%. Notably, QIIME 2 was the most computationally expensive tool, with CPU time and memory usage almost 2 and 30 times higher than MAPseq, respectively. Using the SILVA database generally yielded a higher recall than using Greengenes, while assignment results of different 16S rRNA variable sub-regions varied up to 40% between samples analysed with the same pipeline. Conclusions Our results support the use of either QIIME 2 or MAPseq for optimal 16S rRNA gene profiling, and we suggest that the choice between the two should be based on the level of recall, precision, and/or computational performance required.
Collapse
Affiliation(s)
- Alexandre Almeida
- EMBL-EBI European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Alex L Mitchell
- EMBL-EBI European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aleksandra Tarkowska
- EMBL-EBI European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert D Finn
- EMBL-EBI European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
42
|
Parmar S, Li Q, Wu Y, Li X, Yan J, Sharma VK, Wei Y, Li H. Endophytic fungal community of Dysphania ambrosioides from two heavy metal-contaminated sites: evaluated by culture-dependent and culture-independent approaches. Microb Biotechnol 2018; 11:1170-1183. [PMID: 30256529 PMCID: PMC6196397 DOI: 10.1111/1751-7915.13308] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 07/30/2018] [Accepted: 08/01/2018] [Indexed: 01/26/2023] Open
Abstract
Endophytic fungal communities of Dysphania ambrosioides, a hyperaccumulator growing at two Pb-Zn-contaminated sites, were investigated through culture-dependent and culture-independent approaches. A total of 237 culturable endophytic fungi (EF) were isolated from 368 tissue (shoot and roots) segments, and the colonization rate (CR) ranged from 9.64% to 65.98%. The isolates were identified to 43 taxa based on morphological characteristics and rDNA ITS sequence analysis. Among them, 13 taxa (30.23%) were common in plant tissues from both sites; however, dominant EF were dissimilar. In culture-dependent study, 1989 OTUs were obtained through Illumina Miseq sequencing, and dominant EF were almost same in plant tissues from both sites. However, some culturable EF were not observed in total endophytic communities. We suggest that combination of both culture-dependent and culture-independent methods will provide more chances for the precise estimation of endophytic fungal community than using either of them. The tissue had more influence on the culturable fungal community structure, whereas the location had more influence on the total fungal community structure (including culturable and unculturable). Both culture-dependent and culture-independent studies illustrated that endophytic fungal communities of D. ambrosioides varied across the sites, which suggested that HM concentration of the soil may have some influence on endophytic fungal diversity.
Collapse
Affiliation(s)
- Shobhika Parmar
- Medical School of Kunming University of Science and TechnologyKunming650500China
| | - Qiaohong Li
- The First People's Hospital of Yunnan ProvinceKunming650032China
- The Affiliated Hospital of Kunming University of Science and TechnologyKunming650500China
| | - Ying Wu
- The First People's Hospital of Yunnan ProvinceKunming650032China
- The Affiliated Hospital of Kunming University of Science and TechnologyKunming650500China
| | - Xinya Li
- Medical School of Kunming University of Science and TechnologyKunming650500China
| | - Jinping Yan
- Medical School of Kunming University of Science and TechnologyKunming650500China
| | - Vijay K. Sharma
- Medical School of Kunming University of Science and TechnologyKunming650500China
| | - Yunlin Wei
- Medical School of Kunming University of Science and TechnologyKunming650500China
| | - Haiyan Li
- Medical School of Kunming University of Science and TechnologyKunming650500China
| |
Collapse
|
43
|
Bonk F, Popp D, Harms H, Centler F. PCR-based quantification of taxa-specific abundances in microbial communities: Quantifying and avoiding common pitfalls. J Microbiol Methods 2018; 153:139-147. [DOI: 10.1016/j.mimet.2018.09.015] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 09/24/2018] [Accepted: 09/25/2018] [Indexed: 11/25/2022]
|
44
|
Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics. Sci Rep 2018; 8:12034. [PMID: 30104688 PMCID: PMC6089906 DOI: 10.1038/s41598-018-30515-5] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 07/24/2018] [Indexed: 12/30/2022] Open
Abstract
Metagenomics research has recently thrived due to DNA sequencing technologies improvement, driving the emergence of new analysis tools and the growth of taxonomic databases. However, there is no all-purpose strategy that can guarantee the best result for a given project and there are several combinations of software, parameters and databases that can be tested. Therefore, we performed an impartial comparison, using statistical measures of classification for eight bioinformatic tools and four taxonomic databases, defining a benchmark framework to evaluate each tool in a standardized context. Using in silico simulated data for 16S rRNA amplicons and whole metagenome shotgun data, we compared the results from different software and database combinations to detect biases related to algorithms or database annotation. Using our benchmark framework, researchers can define cut-off values to evaluate the expected error rate and coverage for their results, regardless the score used by each software. A quick guide to select the best tool, all datasets and scripts to reproduce our results and benchmark any new method are available at https://github.com/Ales-ibt/Metagenomic-benchmark. Finally, we stress out the importance of gold standards, database curation and manual inspection of taxonomic profiling results, for a better and more accurate microbial diversity description.
Collapse
|
45
|
Joint Analysis of Long and Short Reads Enables Accurate Estimates of Microbiome Complexity. Cell Syst 2018; 7:192-200.e3. [PMID: 30056005 DOI: 10.1016/j.cels.2018.06.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Revised: 05/05/2018] [Accepted: 06/15/2018] [Indexed: 01/09/2023]
Abstract
Reduced microbiome diversity has been linked to several diseases. However, estimating the diversity of bacterial communities-the number and the total length of distinct genomes within a metagenome-remains an open problem in microbial ecology. Here, we describe an algorithm for estimating the microbial diversity in a metagenomic sample based on a joint analysis of short and long reads. Unlike previous approaches, the algorithm does not make any assumptions on the distribution of the frequencies of genomes within a metagenome (as in parametric methods) and does not require a large database that covers the total diversity (as in non-parametric methods). We estimate that genomes comprising a human gut metagenome have total length varying from 1.3 to 3.5 billion nucleotides, with genomes responsible for 50% of total abundance having total length varying from only 25 to 61 million nucleotides. In contrast, genomes comprising an aquifer sediment metagenome have more than two orders of magnitude larger total length (≈840 billion nucleotides).
Collapse
|
46
|
Riiser ES, Haverkamp THA, Borgan Ø, Jakobsen KS, Jentoft S, Star B. A Single Vibrionales 16S rRNA Oligotype Dominates the Intestinal Microbiome in Two Geographically Separated Atlantic cod Populations. Front Microbiol 2018; 9:1561. [PMID: 30057577 PMCID: PMC6053498 DOI: 10.3389/fmicb.2018.01561] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 06/25/2018] [Indexed: 11/13/2022] Open
Abstract
Atlantic cod (Gadus morhua) provides an interesting species for the study of host-microbe interactions because it lacks the MHC II complex that is involved in the presentation of extracellular pathogens. Nonetheless, little is known about the diversity of its microbiome in natural populations. Here, we use high-throughput sequencing of the 16S rRNA V4 region, amplified with the primer design of the Earth Microbiome Project (EMP), to investigate the microbial composition in gut content and mucosa of 22 adult individuals from two coastal populations in Norway, located 470 km apart. We identify a core microbiome of 23 OTUs (97% sequence similarity) in all individuals that comprises 93% of the total number of reads. The most abundant orders are classified as Vibrionales, Fusobacteriales, Clostridiales, and Bacteroidales. While mucosal samples show significantly lower diversity than gut content samples, no differences in OTU community composition are observed between the two geographically separated populations. All specimens share a limited number of abundant OTUs. Moreover, the most abundant OTU consists of a single oligotype (order Vibrionales, genus Photobacterium) that represents nearly 50% of the reads in both locations. Our results suggest that these microbiomes comprise a limited number of species or that the EMP V4 primers do not yield sufficient resolution to confidently separate these communities. Our study contributes to a growing body of literature that shows limited spatial differentiation of the intestinal microbiomes in marine fish based on 16S rRNA sequencing, highlighting the need for multi-gene approaches to provide more insight into the diversity of these communities.
Collapse
Affiliation(s)
- Even S Riiser
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Thomas H A Haverkamp
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Ørnulf Borgan
- Department of Mathematics, University of Oslo, Oslo, Norway
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| |
Collapse
|
47
|
Ai D, Pan H, Huang R, Xia LC. CoreProbe: A Novel Algorithm for Estimating Relative Abundance Based on Metagenomic Reads. Genes (Basel) 2018; 9:E313. [PMID: 29925824 PMCID: PMC6027520 DOI: 10.3390/genes9060313] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Revised: 06/12/2018] [Accepted: 06/13/2018] [Indexed: 11/16/2022] Open
Abstract
With the rapid development of high-throughput sequencing technology, the analysis of metagenomic sequencing data and the accurate and efficient estimation of relative microbial abundance have become important ways to explore the microbial composition and function of microbes. In addition, the accuracy and efficiency of the relative microbial abundance estimation are closely related to the algorithm and the selection of the reference sequence for sequence alignment. We introduced the microbial core genome as the reference sequence for potential microbes in a metagenomic sample, and we constructed a finite mixture and latent Dirichlet models and used the Gibbs sampling algorithm to estimate the relative abundance of microorganisms. The simulation results showed that our approach can improve the efficiency while maintaining high accuracy and is more suitable for high-throughput metagenomic data. The new approach was implemented in our CoreProbe package which provides a pipeline for an accurate and efficient estimation of the relative abundance of microbes in a community. This tool is available free of charge from the CoreProbe's website: Access the Docker image with the following instruction: sudo docker pull panhongfei/coreprobe:1.0.
Collapse
Affiliation(s)
- Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
| | - Hongfei Pan
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
| | | | - Li C Xia
- Department of Medicine, Stanford University School of Medicine, 269 Campus Dr., Stanford, CA 94305, USA.
| |
Collapse
|
48
|
Taxon Disappearance from Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every Sequencing Run. mSystems 2018; 3:mSystems00023-18. [PMID: 29629423 PMCID: PMC5883066 DOI: 10.1128/msystems.00023-18] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Accepted: 03/07/2018] [Indexed: 01/10/2023] Open
Abstract
Mock communities have been used in microbiome method development to help estimate biases introduced in PCR amplification and sequencing and to optimize pipeline outputs. Nevertheless, the strong value of routine mock community analysis beyond initial method development is rarely, if ever, considered. Here we report that our routine use of mock communities as internal standards allowed us to discover highly aberrant and strong biases in the relative proportions of multiple taxa in a single Illumina HiSeqPE250 run. In this run, an important archaeal taxon virtually disappeared from all samples, and other mock community taxa showed >2-fold high or low abundance, whereas a rerun of those identical amplicons (from the same reaction tubes) on a different date yielded "normal" results. Although obvious from the strange mock community results, we could have easily missed the problem had we not used the mock communities because of natural variation of microbiomes at our site. The "normal" results were validated over four MiSeqPE300 runs and three HiSeqPE250 runs, and run-to-run variation was usually low. While validating these "normal" results, we also discovered that some mock microbial taxa had relatively modest, but consistent, differences between sequencing platforms. We strongly advise the use of mock communities in every sequencing run to distinguish potentially serious aberrations from natural variations. The mock communities should have more than just a few members and ideally at least partly represent the samples being analyzed to detect problems that show up only in some taxa and also to help validate clustering. IMPORTANCE Despite the routine use of standards and blanks in virtually all chemical or physical assays and most biological studies (a kind of "control"), microbiome analysis has traditionally lacked such standards. Here we show that unexpected problems of unknown origin can occur in such sequencing runs and yield completely incorrect results that would not necessarily be detected without the use of standards. Assuming that the microbiome sequencing analysis works properly every time risks serious errors that can be detected by the use of mock communities.
Collapse
|
49
|
Campanaro S, Treu L, Kougias PG, Zhu X, Angelidaki I. Taxonomy of anaerobic digestion microbiome reveals biases associated with the applied high throughput sequencing strategies. Sci Rep 2018; 8:1926. [PMID: 29386622 PMCID: PMC5792648 DOI: 10.1038/s41598-018-20414-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 01/17/2018] [Indexed: 11/20/2022] Open
Abstract
In the past few years, many studies investigated the anaerobic digestion microbiome by means of 16S rRNA amplicon sequencing. Results obtained from these studies were compared to each other without taking into consideration the followed procedure for amplicons preparation and data analysis. This negligence was mainly due to the lack of knowledge regarding the biases influencing specific steps of the microbiome investigation process. In the present study, the main technical aspects of the 16S rRNA analysis were checked giving special attention to the approach used for high throughput sequencing. More specifically, the microbial compositions of three laboratory scale biogas reactors were analyzed before and after addition of sodium oleate by sequencing the microbiome with three different approaches: 16S rRNA amplicon sequencing, shotgun DNA and shotgun RNA. This comparative analysis revealed that, in amplicon sequencing, abundance of some taxa (Euryarchaeota and Spirochaetes) was biased by the inefficiency of universal primers to hybridize all the templates. Reliability of the results obtained was also influenced by the number of hypervariable regions under investigation. Finally, amplicon sequencing and shotgun DNA underestimated the Methanoculleus genus, probably due to the low 16S rRNA gene copy number encoded in this taxon.
Collapse
Affiliation(s)
- Stefano Campanaro
- Department of Biology, University of Padova, Via U. Bassi 58/b, 35121, Padova, Italy
| | - Laura Treu
- Department of Environmental Engineering, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark.
| | - Panagiotis G Kougias
- Department of Environmental Engineering, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark
| | - Xinyu Zhu
- Department of Environmental Engineering, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark
| | - Irini Angelidaki
- Department of Environmental Engineering, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark
| |
Collapse
|
50
|
Carbon Amendments Alter Microbial Community Structure and Net Mercury Methylation Potential in Sediments. Appl Environ Microbiol 2018; 84:AEM.01049-17. [PMID: 29150503 PMCID: PMC5772229 DOI: 10.1128/aem.01049-17] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 09/28/2017] [Indexed: 01/08/2023] Open
Abstract
Neurotoxic methylmercury (MeHg) is produced by anaerobic Bacteria and Archaea possessing the genes hgcAB, but it is unknown how organic substrate and electron acceptor availability impacts the distribution and abundance of these organisms. We evaluated the impact of organic substrate amendments on mercury (Hg) methylation rates, microbial community structure, and the distribution of hgcAB+ microbes with sediments. Sediment slurries were amended with short-chain fatty acids, alcohols, or a polysaccharide. Minimal increases in MeHg were observed following lactate, ethanol, and methanol amendments, while a significant decrease (∼70%) was observed with cellobiose incubations. Postincubation, microbial diversity was assessed via 16S rRNA amplicon sequencing. The presence of hgcAB+ organisms was assessed with a broad-range degenerate PCR primer set for both genes, while the presence of microbes in each of the three dominant clades of methylators (Deltaproteobacteria, Firmicutes, and methanogenic Archaea) was measured with clade-specific degenerate hgcA quantitative PCR (qPCR) primer sets. The predominant microorganisms in unamended sediments consisted of Proteobacteria, Firmicutes, Bacteroidetes, and Actinobacteria Clade-specific qPCR identified hgcA+Deltaproteobacteria and Archaea in all sites but failed to detect hgcA+Firmicutes Cellobiose shifted the communities in all samples to ∼90% non-hgcAB-containing Firmicutes (mainly Bacillus spp. and Clostridium spp.). These results suggest that either expression of hgcAB is downregulated or, more likely given the lack of 16S rRNA gene presence after cellobiose incubation, Hg-methylating organisms are largely outcompeted by cellobiose degraders or degradation products of cellobiose. These results represent a step toward understanding and exploring simple methodologies for controlling MeHg production in the environment.IMPORTANCE Methylmercury (MeHg) is a neurotoxin produced by microorganisms that bioacummulates in the food web and poses a serious health risk to humans. Currently, the impact that organic substrate or electron acceptor availability has on the mercury (Hg)-methylating microorganisms is unclear. To study this, we set up microcosm experiments exposed to different organic substrates and electron acceptors and assayed for Hg methylation rates, for microbial community structure, and for distribution of Hg methylators. The sediment and groundwater was collected from East Fork Poplar Creek in Oak Ridge, TN. Amendment with cellobiose (a lignocellulosic degradation by-product) led to a drastic decrease in the Hg methylation rate compared to that in an unamended control, with an associated shift in the microbial community to mostly nonmethylating Firmicutes This, along with previous Hg-methylating microorganism identification methods, will be important for identifying strategies to control MeHg production and inform future remediation strategies.
Collapse
|